Postgresql
更新分區表時的次優查詢計劃
背景
我有一個簡單的 CTE 用於更新聲明性分區表。
子查詢快速執行(
1.7sec
perEXPLAIN ANALYZE
)並返回3,769
記錄(下面查詢中的 CTEy
)。旨在更新聲明性分區表的
UPDATE
非索引列。表的幾個特點:
- 包含
21 million
記錄184
分區——是的,太多了——每個子分區都有主/外鍵和索引fillfactor=100
(計劃減少可能在相同頁面上使用 HOT,但不確定因缺少頁面空間而影響的查詢計劃)在子查詢(通過 CTE)執行(雜湊連接的幾個嵌套循環)之後
UPDATE
,根據顯示的增量查詢計劃( )出現問題:EXPLAIN
-> Hash Join /* Run for each child partition, based on CTE PK = child table PK */ -> Nested Loop -> CTE Scan -> Append -> Index Scan /* for index on each child partition */ ... /* Index scans (for each child partition) */ -> Hash -> Seq Scan /* on child table */ ... /* Hash Joins (for each child partition) */
詢問
以下查詢是
UPDATE
導致問題的語句。parent_table
基本上,查詢使用一個不能嵌套到單個 SQL 語句中的值執行幾個函式(因此使用了兩個 CTE),然後結果UPDATE
相同parent_table
(函式很昂貴,因此結果儲存在表本身中)。WITH x AS ( SELECT t."p1", t."p2", f(t."b1") OVER "win_x" AS "c1" FROM parent_table AS "t" WHERE t."p1" IN ('val1','val2') WINDOW "win_x" AS (PARTITION BY "p1" ORDER BY "p1","p2") ), y AS ( SELECT x."p1", x."p2", f(x."c1") OVER "win_y" AS "c2" FROM x WINDOW "win_y" AS (PARTITION BY "p1" ORDER BY "p1","p2") ) UPDATE parent_table AS "t2" SET ("a1")=(t.”b2”*y."c2") FROM y INNER JOIN parent_table AS "t" USING ("p1","p2") WHERE t2."p1"=y."p1" AND t2."p2"=y."p2";
表定義
表定義如下:
CREATE TABLE IF NOT EXISTS parent_table ( p1 integer, p2 timestamp without time zone, a1 numeric, b1 numeric, b2 numeric, c1 numeric, c2 numeric, ... /* additional 8 columns of numeric type */ z1 numeric, z2 numeric, CONSTRAINT lbound_z1 CHECK ( “z1”::numeric >= 0), CONSTRAINT lbound_z2 CHECK ( “z2”::numeric >= 0) PARTITION BY RANGE (EXTRACT(YEAR FROM p2), EXTRACT(MONTH FROM p2)) WITH (OIDS=‘false’) TABLESPACE ts_ssd_raid10; CREATE TABLE IF NOT EXISTS child_table_yyyy_mm PARTITION OF parent_table ( CONSTRAINT child_pk PRIMARY KEY (p1, p2) WITH (FILLFACTOR=‘90’) USING INDEX TABLESPACE ts_idx_m2ssd CONSTRAINT child_fk_other_child_yyyy_mm FOREIGN KEY (p1, p2) REFERENCES other_child_yyyy_mm (p1,p2) MATCH FULL WITH (FILLFACTOR=‘90’) USING INDEX TABLESPACE ts_idx_m2ssd ) FOR VALUES FROM (yyyy, mm) TO (yyyy, mm) WITH (FILLFACTOR=‘90’, OIDS=‘false’) TABLESPACE ts_ssd_raid10; ALTER TABLE child_table CLUSTER ON “child_pk”;
問題
如何在
UPDATE
沒有嵌套循環雜湊聯接的情況下為 184 個子表中的每一個執行?系統資訊
Postgres 版本 10.3
通過使用第三個 CTE 將執行時間減少到 1-2 秒:
WITH x AS ( SELECT t."p1", t."p2", t.”b2”, f(t."b1") OVER "win_x" AS "c1" FROM parent_table AS "t" WHERE t."p1" IN ('val1','val2') WINDOW "win_x" AS (PARTITION BY "p1" ORDER BY "p1","p2") ), y AS ( SELECT x."p1", x."p2", x.”b2”, f(x."c1") OVER "win_y" AS "c2" FROM x WINDOW "win_y" AS (PARTITION BY "p1" ORDER BY "p1","p2") ), z AS ( SELECT y.”p1", y.”p2", y.”b2”*y.”c2” AS “a1” FROM y ) UPDATE parent_table AS "t” SET “a1"=z.”a1” FROM z WHERE t.”p1"=z.”p1" AND t.”p2"=z.”p2";