Postgresql

Postgresql 多次擴展視圖而不是執行一次

  • November 28, 2019

我有一個較長的子查詢,我將其儲存為視圖

CREATE VIEW scans_minmax AS
SELECT tmp.source,
      value ->> 'Ok'                                                    AS "Ok",
      value ->> 'NotOk'                                                 AS "NotOk",
      value ->> 'TotalCount'                                            AS "TotalCount",
      (CASE WHEN scans.timestamp = tmp.min_t THEN 'MIN' ELSE 'MAX' END) AS "Aggregation"
FROM (SELECT source, MIN(timestamp) AS min_t, MAX(timestamp) AS max_t
     FROM scans
     WHERE timestamp > NOW() - INTERVAL '30 days'
       AND value ->> 'Ok' IS NOT NULL
       AND value ->> 'NotOk' IS NOT NULL
       AND value ->> 'TotalCount' IS NOT NULL
     GROUP BY source) AS tmp,
    scans
WHERE scans.source = tmp.source
 AND (scans.timestamp = tmp.min_t OR scans.timestamp = tmp.max_t)

結果這給了我一張漂亮的桌子

| source | Ok | NotOk | TotalCount | Aggregation |
--------------------------------------------------
| A      | a  | b     | c          | MIN         |
| A      | x  | y     | z          | MAX         |

但是因為我希望表也“轉置”(MIN / MAX 作為每個值的相鄰列)我創建了查詢

SELECT min_tbl.source,
      min_tbl."Ok" AS "min_ok", max_tbl."Ok" AS "max_ok",
      min_tbl."NotOk" AS "min_not_ok", max_tbl."NotOk" AS "max_not_ok",
      min_tbl."TotalCount" AS "min_total_count", max_tbl."TotalCount" AS "max_total_count"
FROM scans_minmax AS min_tbl JOIN
    scans_minmax AS max_tbl ON min_tbl.source = max_tbl.source
WHERE min_tbl."Aggregation" = 'MIN' AND max_tbl."Aggregation" = 'MAX'

這完全正確。

現在,我的問題如下:上面的視圖有 16 個條目(我的數據庫中有 8 個不同的源條目)。所以第二個查詢應該和一個簡單的SELECT * FROM scans_minmax. 但事實證明這需要更多時間,因為優化器不會執行一次擴展視圖查詢,而是簡單地操作這個臨時表,而是將視圖擴展兩次,從而進行兩次昂貴的聚合。

EXPLAIN ANALYZE 對上面的查詢給出

Nested Loop  (cost=173527.09..173543.19 rows=1 width=231) (actual time=2594.697..2594.852 rows=8 loops=1)
 ->  Merge Join  (cost=173518.21..173518.28 rows=1 width=617) (actual time=2594.580..2594.610 rows=8 loops=1)
       Merge Cond: ((scans_1.source)::text = (scans_3.source)::text)
       ->  Sort  (cost=86883.50..86883.50 rows=1 width=562) (actual time=1292.558..1292.561 rows=8 loops=1)
             Sort Key: scans_2.source
             Sort Method: quicksort  Memory: 31kB
             ->  Nested Loop  (cost=86643.23..86883.49 rows=1 width=562) (actual time=1292.406..1292.519 rows=8 loops=1)
                   ->  HashAggregate  (cost=86634.35..86634.45 rows=10 width=55) (actual time=1292.281..1292.285 rows=8 loops=1)
                         Group Key: scans_2.source
                         ->  Index Scan using scans_timestamp_index on scans scans_2  (cost=0.43..83024.66 rows=481292 width=47) (actual time=0.174..1036.297 rows=545244 loops=1)
                               Index Cond: ("timestamp" > (now() - '30 days'::interval))
                               Filter: (((value ->> 'Ok'::text) IS NOT NULL) AND ((value ->> 'NotOk'::text) IS NOT NULL) AND ((value ->> 'TotalCount'::text) IS NOT NULL))
                               Rows Removed by Filter: 98
                   ->  Bitmap Heap Scan on scans scans_1  (cost=8.88..24.88 rows=1 width=531) (actual time=0.025..0.026 rows=1 loops=8)
                         Recheck Cond: (("timestamp" = (min(scans_2."timestamp"))) OR ("timestamp" = (max(scans_2."timestamp"))))
                         Filter: (((scans_2.source)::text = (source)::text) AND (CASE WHEN ("timestamp" = (min(scans_2."timestamp"))) THEN 'MIN'::text ELSE 'MAX'::text END = 'MAX'::text))
                         Rows Removed by Filter: 1
                         Heap Blocks: exact=16
                         ->  BitmapOr  (cost=8.88..8.88 rows=4 width=0) (actual time=0.018..0.018 rows=0 loops=8)
                               ->  Bitmap Index Scan on asdf  (cost=0.00..4.44 rows=2 width=0) (actual time=0.010..0.010 rows=1 loops=8)
                                     Index Cond: ("timestamp" = (min(scans_2."timestamp")))
                               ->  Bitmap Index Scan on asdf  (cost=0.00..4.44 rows=2 width=0) (actual time=0.006..0.006 rows=1 loops=8)
                                     Index Cond: ("timestamp" = (max(scans_2."timestamp")))
       ->  Sort  (cost=86634.71..86634.74 rows=10 width=55) (actual time=1302.016..1302.016 rows=8 loops=1)
             Sort Key: scans_3.source
             Sort Method: quicksort  Memory: 26kB
             ->  HashAggregate  (cost=86634.35..86634.45 rows=10 width=55) (actual time=1301.967..1301.969 rows=8 loops=1)
                   Group Key: scans_3.source
                   ->  Index Scan using scans_timestamp_index on scans scans_3  (cost=0.43..83024.66 rows=481292 width=47) (actual time=0.080..1042.757 rows=545244 loops=1)
                         Index Cond: ("timestamp" > (now() - '30 days'::interval))
                         Filter: (((value ->> 'Ok'::text) IS NOT NULL) AND ((value ->> 'NotOk'::text) IS NOT NULL) AND ((value ->> 'TotalCount'::text) IS NOT NULL))
                         Rows Removed by Filter: 98
 ->  Bitmap Heap Scan on scans  (cost=8.88..24.88 rows=1 width=531) (actual time=0.023..0.024 rows=1 loops=8)
       Recheck Cond: (("timestamp" = (min(scans_3."timestamp"))) OR ("timestamp" = (max(scans_3."timestamp"))))
       Filter: (((scans_3.source)::text = (source)::text) AND (CASE WHEN ("timestamp" = (min(scans_3."timestamp"))) THEN 'MIN'::text ELSE 'MAX'::text END = 'MIN'::text))
       Rows Removed by Filter: 1
       Heap Blocks: exact=16
       ->  BitmapOr  (cost=8.88..8.88 rows=4 width=0) (actual time=0.016..0.016 rows=0 loops=8)
             ->  Bitmap Index Scan on asdf  (cost=0.00..4.44 rows=2 width=0) (actual time=0.009..0.009 rows=1 loops=8)
                   Index Cond: ("timestamp" = (min(scans_3."timestamp")))
             ->  Bitmap Index Scan on asdf  (cost=0.00..4.44 rows=2 width=0) (actual time=0.006..0.006 rows=1 loops=8)
                   Index Cond: ("timestamp" = (max(scans_3."timestamp")))
Planning time: 2.253 ms
Execution time: 2594.994 ms

為什麼會這樣,有沒有辦法避免這種行為?我想要優化器做的就是將 VIEW 評估為臨時表,並在我的第二個查詢中對這個臨時表進行連接。

我正在使用 PostgreSQL 9.6.15。

已經謝謝了!

這是預期的行為:視圖源被查詢重寫簡單地替換為它的名稱,然後像往常一樣分析和執行。

要告訴查詢編譯器您只想在使用公用表表達式(CTE) 後處理視圖:

WITH scans_minmax (source, "Ok", "NotOk", "TotalCount", "Aggregation") AS (
SELECT tmp.source,
      value ->> 'Ok',
      value ->> 'NotOk',
      value ->> 'TotalCount',
      (CASE WHEN scans.timestamp = tmp.min_t THEN 'MIN' ELSE 'MAX' END)
FROM (SELECT source, MIN(timestamp) AS min_t, MAX(timestamp) AS max_t
     FROM scans
     WHERE timestamp > NOW() - INTERVAL '30 days'
       AND value ->> 'Ok' IS NOT NULL
       AND value ->> 'NotOk' IS NOT NULL
       AND value ->> 'TotalCount' IS NOT NULL
     GROUP BY source) AS tmp,
    scans
WHERE scans.source = tmp.source
 AND (scans.timestamp = tmp.min_t OR scans.timestamp = tmp.max_t)
)
SELECT min_tbl.source,
      min_tbl."Ok" AS "min_ok", max_tbl."Ok" AS "max_ok",
      min_tbl."NotOk" AS "min_not_ok", max_tbl."NotOk" AS "max_not_ok",
      min_tbl."TotalCount" AS "min_total_count", max_tbl."TotalCount" AS "max_total_count"
FROM scans_minmax AS min_tbl JOIN
    scans_minmax AS max_tbl ON min_tbl.source = max_tbl.source
WHERE min_tbl."Aggregation" = 'MIN' AND max_tbl."Aggregation" = 'MAX'

Postgres 將實現 CTE 的結果(無論如何在 Postgres 12 之前),導致您的聚合只執行一次。

請注意,通常在使用 CTE 時需要謹慎:

  1. 這並不一定會使您的整體查詢執行得更快,因為 CTE 物化可能需要額外的伺服器資源——但可能不是您的情況。
  2. CTE 是一個優化柵欄,這意味著來自外部選擇的謂詞不會被推入 CTE,這在某些情況下會導致次優的執行計劃——同樣,在你的情況下不是這樣。

引用自:https://dba.stackexchange.com/questions/254392