Postgresql
postgres_fdw 性能很慢
以下對外部的查詢大約需要 5 秒才能在 320 萬行上執行:
SELECT x."IncidentTypeCode", COUNT(x."IncidentTypeCode") FROM "IntterraNearRealTimeUnitReflexes300sForeign" x WHERE x."IncidentDateTime" >= '05/01/2016' GROUP BY x."IncidentTypeCode" ORDER BY 1;
當我在普通表上執行相同的查詢時,它會在 0.6 秒內返回。執行計劃完全不同:
普通表
Sort (cost=226861.20..226861.21 rows=4 width=4) (actual time=646.447..646.448 rows=7 loops=1) Sort Key: "IncidentTypeCode" Sort Method: quicksort Memory: 25kB -> HashAggregate (cost=226861.12..226861.16 rows=4 width=4) (actual time=646.433..646.434 rows=7 loops=1) Group Key: "IncidentTypeCode" -> Bitmap Heap Scan on "IntterraNearRealTimeUnitReflexes300s" x (cost=10597.63..223318.41 rows=708542 width=4) (actual time=74.593..342.110 rows=709376 loops=1) Recheck Cond: ("IncidentDateTime" >= '2016-05-01 00:00:00'::timestamp without time zone) Rows Removed by Index Recheck: 12259 Heap Blocks: exact=27052 lossy=26888 -> Bitmap Index Scan on idx_incident_date_time_300 (cost=0.00..10420.49 rows=708542 width=0) (actual time=69.722..69.722 rows=709376 loops=1) Index Cond: ("IncidentDateTime" >= '2016-05-01 00:00:00'::timestamp without time zone) Planning time: 0.165 ms Execution time: 646.512 ms
外表
Sort (cost=241132.04..241132.05 rows=4 width=4) (actual time=4782.110..4782.112 rows=7 loops=1) Sort Key: "IncidentTypeCode" Sort Method: quicksort Memory: 25kB -> HashAggregate (cost=241131.96..241132.00 rows=4 width=4) (actual time=4782.097..4782.100 rows=7 loops=1) Group Key: "IncidentTypeCode" -> Foreign Scan on "IntterraNearRealTimeUnitReflexes300sForeign" x (cost=10697.63..237589.25 rows=708542 width=4) (actual time=1.916..4476.946 rows=709376 loops=1) Planning time: 1.413 ms Execution time: 4782.660 ms
我認為我為該
GROUP BY
條款付出了高昂的代價,當我EXPLAIN VERBOSE
:SELECT "IncidentTypeCode" FROM PUBLIC ."IntterraNearRealTimeUnitReflexes300s" WHERE ( ( "IncidentDateTime" >= '2016-05-01 00:00:00' :: TIMESTAMP WITHOUT TIME ZONE ) )
這將返回 700k 行。有沒有解決的辦法?
昨天我花了很多時間閱讀這個文件頁面,並認為我已經找到了設置
use_remote_estimate
為 true 的答案,但它沒有任何效果。如有必要,我確實可以訪問外部伺服器來創建對象。子句中的時間戳值
WHERE
可以是任何值;它不是來自預定義值的列表。
如果您使用
use_remote_estimate
一定要執行ANALYZE外部表(我看到估計與返回的表非常接近,您可能已經這樣做了)。此外,下推改進在 <9.5 版本中不可用。我還假設您在遠端伺服器上也有相同的表結構(包括索引)。如果由於基數低而需要點陣圖,由於下推機制的限制,它不會使用索引。您可能希望減少返回的行數以強制執行 BTREE 索引掃描(時間戳範圍)。不幸的是,如果過濾器返回表中 +10% 的行(如果計劃者認為掃描整個表比查找讀取更便宜,則可能會改變這個百分比),那麼沒有乾淨的方法可以避免遠端伺服器上的 SeqScan。如果您使用的是 SSD,您可能會發現調整有用random_page_cost
)。您可以使用 CTE 來隔離 GROUP BY 行為:
WITH atable AS ( SELECT "IncidentTypeCode" FROM PUBLIC ."IntterraNearRealTimeUnitReflexes300s" WHERE ("IncidentDateTime" BETWEEN '2016-05-01 00:00:00'::TIMESTAMP WITHOUT TIME ZONE AND '2016-05-02 00:00:00'::TIMESTAMP WITHOUT TIME ZONE) ) SELECT atable."IncidentTypeCode", COUNT(atable.IncidentTypeCode) FROM atable GROUP BY atable."IncidentTypeCode" ORDER BY atable."IncidentTypeCode";