Postgresql
Postgresql 多次擴展視圖而不是執行一次
我有一個較長的子查詢,我將其儲存為視圖
CREATE VIEW scans_minmax AS SELECT tmp.source, value ->> 'Ok' AS "Ok", value ->> 'NotOk' AS "NotOk", value ->> 'TotalCount' AS "TotalCount", (CASE WHEN scans.timestamp = tmp.min_t THEN 'MIN' ELSE 'MAX' END) AS "Aggregation" FROM (SELECT source, MIN(timestamp) AS min_t, MAX(timestamp) AS max_t FROM scans WHERE timestamp > NOW() - INTERVAL '30 days' AND value ->> 'Ok' IS NOT NULL AND value ->> 'NotOk' IS NOT NULL AND value ->> 'TotalCount' IS NOT NULL GROUP BY source) AS tmp, scans WHERE scans.source = tmp.source AND (scans.timestamp = tmp.min_t OR scans.timestamp = tmp.max_t)
結果這給了我一張漂亮的桌子
| source | Ok | NotOk | TotalCount | Aggregation | -------------------------------------------------- | A | a | b | c | MIN | | A | x | y | z | MAX |
但是因為我希望表也“轉置”(MIN / MAX 作為每個值的相鄰列)我創建了查詢
SELECT min_tbl.source, min_tbl."Ok" AS "min_ok", max_tbl."Ok" AS "max_ok", min_tbl."NotOk" AS "min_not_ok", max_tbl."NotOk" AS "max_not_ok", min_tbl."TotalCount" AS "min_total_count", max_tbl."TotalCount" AS "max_total_count" FROM scans_minmax AS min_tbl JOIN scans_minmax AS max_tbl ON min_tbl.source = max_tbl.source WHERE min_tbl."Aggregation" = 'MIN' AND max_tbl."Aggregation" = 'MAX'
這完全正確。
現在,我的問題如下:上面的視圖有 16 個條目(我的數據庫中有 8 個不同的源條目)。所以第二個查詢應該和一個簡單的
SELECT * FROM scans_minmax
. 但事實證明這需要更多時間,因為優化器不會執行一次擴展視圖查詢,而是簡單地操作這個臨時表,而是將視圖擴展兩次,從而進行兩次昂貴的聚合。EXPLAIN ANALYZE 對上面的查詢給出
Nested Loop (cost=173527.09..173543.19 rows=1 width=231) (actual time=2594.697..2594.852 rows=8 loops=1) -> Merge Join (cost=173518.21..173518.28 rows=1 width=617) (actual time=2594.580..2594.610 rows=8 loops=1) Merge Cond: ((scans_1.source)::text = (scans_3.source)::text) -> Sort (cost=86883.50..86883.50 rows=1 width=562) (actual time=1292.558..1292.561 rows=8 loops=1) Sort Key: scans_2.source Sort Method: quicksort Memory: 31kB -> Nested Loop (cost=86643.23..86883.49 rows=1 width=562) (actual time=1292.406..1292.519 rows=8 loops=1) -> HashAggregate (cost=86634.35..86634.45 rows=10 width=55) (actual time=1292.281..1292.285 rows=8 loops=1) Group Key: scans_2.source -> Index Scan using scans_timestamp_index on scans scans_2 (cost=0.43..83024.66 rows=481292 width=47) (actual time=0.174..1036.297 rows=545244 loops=1) Index Cond: ("timestamp" > (now() - '30 days'::interval)) Filter: (((value ->> 'Ok'::text) IS NOT NULL) AND ((value ->> 'NotOk'::text) IS NOT NULL) AND ((value ->> 'TotalCount'::text) IS NOT NULL)) Rows Removed by Filter: 98 -> Bitmap Heap Scan on scans scans_1 (cost=8.88..24.88 rows=1 width=531) (actual time=0.025..0.026 rows=1 loops=8) Recheck Cond: (("timestamp" = (min(scans_2."timestamp"))) OR ("timestamp" = (max(scans_2."timestamp")))) Filter: (((scans_2.source)::text = (source)::text) AND (CASE WHEN ("timestamp" = (min(scans_2."timestamp"))) THEN 'MIN'::text ELSE 'MAX'::text END = 'MAX'::text)) Rows Removed by Filter: 1 Heap Blocks: exact=16 -> BitmapOr (cost=8.88..8.88 rows=4 width=0) (actual time=0.018..0.018 rows=0 loops=8) -> Bitmap Index Scan on asdf (cost=0.00..4.44 rows=2 width=0) (actual time=0.010..0.010 rows=1 loops=8) Index Cond: ("timestamp" = (min(scans_2."timestamp"))) -> Bitmap Index Scan on asdf (cost=0.00..4.44 rows=2 width=0) (actual time=0.006..0.006 rows=1 loops=8) Index Cond: ("timestamp" = (max(scans_2."timestamp"))) -> Sort (cost=86634.71..86634.74 rows=10 width=55) (actual time=1302.016..1302.016 rows=8 loops=1) Sort Key: scans_3.source Sort Method: quicksort Memory: 26kB -> HashAggregate (cost=86634.35..86634.45 rows=10 width=55) (actual time=1301.967..1301.969 rows=8 loops=1) Group Key: scans_3.source -> Index Scan using scans_timestamp_index on scans scans_3 (cost=0.43..83024.66 rows=481292 width=47) (actual time=0.080..1042.757 rows=545244 loops=1) Index Cond: ("timestamp" > (now() - '30 days'::interval)) Filter: (((value ->> 'Ok'::text) IS NOT NULL) AND ((value ->> 'NotOk'::text) IS NOT NULL) AND ((value ->> 'TotalCount'::text) IS NOT NULL)) Rows Removed by Filter: 98 -> Bitmap Heap Scan on scans (cost=8.88..24.88 rows=1 width=531) (actual time=0.023..0.024 rows=1 loops=8) Recheck Cond: (("timestamp" = (min(scans_3."timestamp"))) OR ("timestamp" = (max(scans_3."timestamp")))) Filter: (((scans_3.source)::text = (source)::text) AND (CASE WHEN ("timestamp" = (min(scans_3."timestamp"))) THEN 'MIN'::text ELSE 'MAX'::text END = 'MIN'::text)) Rows Removed by Filter: 1 Heap Blocks: exact=16 -> BitmapOr (cost=8.88..8.88 rows=4 width=0) (actual time=0.016..0.016 rows=0 loops=8) -> Bitmap Index Scan on asdf (cost=0.00..4.44 rows=2 width=0) (actual time=0.009..0.009 rows=1 loops=8) Index Cond: ("timestamp" = (min(scans_3."timestamp"))) -> Bitmap Index Scan on asdf (cost=0.00..4.44 rows=2 width=0) (actual time=0.006..0.006 rows=1 loops=8) Index Cond: ("timestamp" = (max(scans_3."timestamp"))) Planning time: 2.253 ms Execution time: 2594.994 ms
為什麼會這樣,有沒有辦法避免這種行為?我想要優化器做的就是將 VIEW 評估為臨時表,並在我的第二個查詢中對這個臨時表進行連接。
我正在使用 PostgreSQL 9.6.15。
已經謝謝了!
這是預期的行為:視圖源被查詢重寫簡單地替換為它的名稱,然後像往常一樣分析和執行。
要告訴查詢編譯器您只想在使用公用表表達式(CTE) 後處理視圖:
WITH scans_minmax (source, "Ok", "NotOk", "TotalCount", "Aggregation") AS ( SELECT tmp.source, value ->> 'Ok', value ->> 'NotOk', value ->> 'TotalCount', (CASE WHEN scans.timestamp = tmp.min_t THEN 'MIN' ELSE 'MAX' END) FROM (SELECT source, MIN(timestamp) AS min_t, MAX(timestamp) AS max_t FROM scans WHERE timestamp > NOW() - INTERVAL '30 days' AND value ->> 'Ok' IS NOT NULL AND value ->> 'NotOk' IS NOT NULL AND value ->> 'TotalCount' IS NOT NULL GROUP BY source) AS tmp, scans WHERE scans.source = tmp.source AND (scans.timestamp = tmp.min_t OR scans.timestamp = tmp.max_t) ) SELECT min_tbl.source, min_tbl."Ok" AS "min_ok", max_tbl."Ok" AS "max_ok", min_tbl."NotOk" AS "min_not_ok", max_tbl."NotOk" AS "max_not_ok", min_tbl."TotalCount" AS "min_total_count", max_tbl."TotalCount" AS "max_total_count" FROM scans_minmax AS min_tbl JOIN scans_minmax AS max_tbl ON min_tbl.source = max_tbl.source WHERE min_tbl."Aggregation" = 'MIN' AND max_tbl."Aggregation" = 'MAX'
Postgres 將實現 CTE 的結果(無論如何在 Postgres 12 之前),導致您的聚合只執行一次。
請注意,通常在使用 CTE 時需要謹慎:
- 這並不一定會使您的整體查詢執行得更快,因為 CTE 物化可能需要額外的伺服器資源——但可能不是您的情況。
- CTE 是一個優化柵欄,這意味著來自外部選擇的謂詞不會被推入 CTE,這在某些情況下會導致次優的執行計劃——同樣,在你的情況下不是這樣。