Postgresql

更新分區表時的次優查詢計劃

  • February 4, 2019

背景

  • 我有一個簡單的 CTE 用於更新聲明性分區表。

  • 子查詢快速執行(1.7secper EXPLAIN ANALYZE)並返回3,769記錄(下面查詢中的 CTE y)。

  • 旨在更新聲明性分區表的UPDATE非索引列。表的幾個特點:

    • 包含21 million記錄
    • 184分區——是的,太多了——每個子分區都有主/外鍵和索引
    • fillfactor=100(計劃減少可能在相同頁面上使用 HOT,但不確定因缺少頁面空間而影響的查詢計劃)

在子查詢(通過 CTE)執行(雜湊連接的幾個嵌套循環)之後UPDATE,根據顯示的增量查詢計劃( )出現問題:EXPLAIN

-> Hash Join /* Run for each child partition, based on CTE PK = child table PK */
    -> Nested Loop
         -> CTE Scan
         -> Append
              -> Index Scan /* for index on each child partition */
                 ... /* Index scans (for each child partition) */
    -> Hash
         -> Seq Scan /* on child table */
  ... /* Hash Joins (for each child partition) */

詢問

以下查詢是UPDATE導致問題的語句。parent_table基本上,查詢使用一個不能嵌套到單個 SQL 語句中的值執行幾個函式(因此使用了兩個 CTE),然後結果UPDATE相同parent_table(函式很昂貴,因此結果儲存在表本身中)。

WITH x AS (
 SELECT t."p1", t."p2", f(t."b1") OVER "win_x" AS "c1"
 FROM parent_table AS "t"
 WHERE t."p1" IN ('val1','val2')
 WINDOW "win_x" AS (PARTITION BY "p1" ORDER BY "p1","p2")
), y AS (
 SELECT x."p1", x."p2", f(x."c1") OVER "win_y" AS "c2"
 FROM x
 WINDOW "win_y" AS (PARTITION BY "p1" ORDER BY "p1","p2")
)
UPDATE parent_table AS "t2"
SET ("a1")=(t.”b2”*y."c2")
FROM y INNER JOIN parent_table AS "t" USING ("p1","p2")
WHERE t2."p1"=y."p1" AND t2."p2"=y."p2";

表定義

表定義如下:

CREATE TABLE IF NOT EXISTS parent_table (
  p1   integer,
  p2  timestamp without time zone,
  a1  numeric,
  b1  numeric,
  b2  numeric,
  c1  numeric,
  c2 numeric,
  ... /* additional 8 columns of numeric type */
  z1 numeric,
  z2 numeric,
CONSTRAINT lbound_z1 CHECK ( “z1”::numeric >= 0),
CONSTRAINT lbound_z2 CHECK ( “z2”::numeric >= 0)
PARTITION BY RANGE (EXTRACT(YEAR FROM p2), EXTRACT(MONTH FROM p2))
WITH (OIDS=‘false’)
TABLESPACE ts_ssd_raid10;

CREATE TABLE IF NOT EXISTS child_table_yyyy_mm PARTITION OF parent_table (
 CONSTRAINT child_pk PRIMARY KEY (p1, p2) WITH (FILLFACTOR=‘90’) USING INDEX TABLESPACE ts_idx_m2ssd
CONSTRAINT child_fk_other_child_yyyy_mm FOREIGN KEY (p1, p2) REFERENCES other_child_yyyy_mm (p1,p2) MATCH FULL WITH (FILLFACTOR=‘90’) USING INDEX TABLESPACE ts_idx_m2ssd )
FOR VALUES FROM (yyyy, mm) TO (yyyy, mm)
WITH (FILLFACTOR=‘90’, OIDS=‘false’)
TABLESPACE ts_ssd_raid10;
ALTER TABLE child_table CLUSTER ON “child_pk”;

問題

如何在UPDATE沒有嵌套循環雜湊聯接的情況下為 184 個子表中的每一個執行?

系統資訊

Postgres 版本 10.3

通過使用第三個 CTE 將執行時間減少到 1-2 秒:

WITH x AS (
 SELECT t."p1", t."p2", t.”b2”, f(t."b1") OVER "win_x" AS "c1"
 FROM parent_table AS "t"
 WHERE t."p1" IN ('val1','val2')
 WINDOW "win_x" AS (PARTITION BY "p1" ORDER BY "p1","p2")
), y AS (
 SELECT x."p1", x."p2", x.”b2”, f(x."c1") OVER "win_y" AS "c2"
 FROM x
 WINDOW "win_y" AS (PARTITION BY "p1" ORDER BY "p1","p2")
), z AS (
 SELECT y.”p1", y.”p2", y.”b2”*y.”c2” AS “a1”
 FROM y
)
UPDATE parent_table AS "t”
SET “a1"=z.”a1”
FROM z
WHERE t.”p1"=z.”p1" AND t.”p2"=z.”p2";

引用自:https://dba.stackexchange.com/questions/228740