Postgresql

為什麼我的查詢一起執行比單獨執行它們花費更長的時間?

  • December 3, 2016

我有一系列遵循一般模式的更新語句:一個更新聚合來自另一個表(或有時是多個表)的值,下一個更新根據聚合值生成排名。此過程重複 23 次,總共 46 條更新語句。每個更新聯需要 30-40 秒才能獨立執行,但是當我通過 PgAdmin 將它們作為單個事務一起執行時,它需要一個多小時,而不是我根據單個查詢時間預期的約 15 分鐘。(我最後一次嘗試停止執行並單獨執行它們。)

如果我通過 psql 在文件中執行同一組更新,則該過程在預期的 15 分鐘時間內完成。

查詢計劃器是否有一些怪癖,會根據單個事務中執行的大量更新語句來更改執行計劃?鑑於 psql 和 PgAdmin 之間的不同行為,我認為這與查詢打包執行的方式有關,但我不夠熟悉,無法知道其中的區別。

有沒有辦法編寫我的程式碼,以便在通過 PgAdmin 作為單個事務執行時提高性能?

我在 Ubuntu 16.04 上使用 PostgreSQL 9.5。

以下是程式碼中的兩個範例對聯:

-- bike_driver_aggressive
UPDATE  generated.crash_aggregates
SET     bike_driver_aggressive = (
           SELECT  COUNT(*)
           FROM    crashes_bike2 c
           WHERE   c.int_id = crash_aggregates.int_id
           AND     c.aggressive_driverfault
       );
WITH ranks AS (
   SELECT  int_id,
           rank() OVER (ORDER BY bike_driver_aggressive DESC) AS rank
   FROM    crash_aggregates
)
UPDATE  generated.crash_aggregates
SET     bike_driver_aggressive_rank = ranks.rank
FROM    ranks
WHERE   crash_aggregates.int_id = ranks.int_id;

-- bike_allinjury
UPDATE  generated.crash_aggregates
SET     bike_allinjury = (
           SELECT  COUNT(*)
           FROM    crashes_bike1 c
           WHERE   c.int_id = crash_aggregates.int_id
           AND     c.injurycrash
       ) + (
           SELECT  COUNT(*)
           FROM    crashes_bike2 c
           WHERE   c.int_id = crash_aggregates.int_id
           AND     c.injurycrash
       );
WITH ranks AS (
   SELECT  int_id,
           rank() OVER (ORDER BY bike_allinjury DESC) AS rank
   FROM    crash_aggregates
)
UPDATE  generated.crash_aggregates
SET     bike_allinjury_rank = ranks.rank
FROM    ranks
WHERE   crash_aggregates.int_id = ranks.int_id;

crash_aggregates 表是這樣創建的:

CREATE TABLE crash_aggregates
(
 int_id integer NOT NULL,
 geom geometry(Point,2231),
 bike_driver_aggressive integer,
 bike_driver_aggressive_rank integer,
 bike_driver_failyield integer,
 bike_driver_failyield_rank integer,
 bike_driver_disregardsignal integer,
 bike_driver_disregardsignal_rank integer,
 bike_highspeed integer,
 bike_highspeed_rank integer,
 bike_biker_aggressive integer,
 bike_biker_aggressive_rank integer,
 bike_biker_failyield integer,
 bike_biker_failyield_rank integer,
 bike_biker_disregardsignal integer,
 bike_biker_disregardsignal_rank integer,
 bike_influence integer,
 bike_influence_rank integer,
 bike_driver_distracted integer,
 bike_driver_distracted_rank integer,
 bike_driver_reckless integer,
 bike_driver_reckless_rank integer,
 bike_tbone integer,
 bike_tbone_rank integer,
 bike_opp_lhook integer,
 bike_opp_lhook_rank integer,
 bike_samedir integer,
 bike_samedir_rank integer,
 bike_samedir_rhook1 integer,
 bike_samedir_rhook1_rank integer,
 bike_samedir_rhook2 integer,
 bike_samedir_rhook2_rank integer,
 bike_perp_rhook integer,
 bike_perp_rhook_rank integer,
 bike_perp_rhook_swalk1 integer,
 bike_perp_rhook_swalk1_rank integer,
 bike_perp_rhook_swalk2 integer,
 bike_perp_rhook_swalk2_rank integer,
 bike_tbone_swalk1 integer,
 bike_tbone_swalk1_rank integer,
 bike_tbone_swalk2 integer,
 bike_tbone_swalk2_rank integer,
 bike_allfatal integer,
 bike_allfatal_rank integer,
 bike_allinjury integer,
 bike_allinjury_rank integer,
 bike_injuryfatal integer,
 bike_injuryfatal_rank integer,
 bike_top10 integer,
 bike_num1s integer,
 bike_num2s integer,
 bike_num3s integer,
 bike_num4s integer,
 bike_num5s integer,
 bike_num6s integer,
 bike_num7s integer,
 bike_num8s integer,
 bike_num9s integer,
 bike_num10s integer,
 CONSTRAINT crash_aggregates_pkey PRIMARY KEY (int_id)
);

自行車1表:

CREATE TABLE received.crashes_bike1
(
 caseid text,
 year integer,
 unittype_one text,
 unittype_two text,
 unittype_three text,
 circumstance text,
 primary_contrib text,
 condition_1 text,
 condition_2 text,
 condition_3 text,
 dirfromint text,
 diroftravel_one text,
 diroftravel_two text,
 tdg_directions text,
 dir_key integer,
 directions text,
 bike_mvmt text,
 veh_mvmt text,
 comb_mvmt text,
 comb_mvmt_sw text,
 map_code integer,
 injury boolean,
 diroftravel_three text,
 disabled_st1 text,
 disabled_st2 text,
 contrib_1 text,
 contrib_2 text,
 contrib_3 text,
 enteredby text,
 entereddate text,
 estvehspeed_one integer,
 estvehspeed_two integer,
 estvehspeed_three integer,
 feetfromint integer,
 firstharmful text,
 mostharmful text,
 secondharmful text,
 internamedir text,
 lightingcondition text,
 location text,
 masterid integer,
 precrashmaneuv_1 text,
 precrashmaneuv_2 text,
 precrashmaneuv_3 text,
 node integer,
 numberinjured integer,
 numberoffatalities integer,
 pedaction_one text,
 pedaction_two text,
 pedaction_three text,
 publicproperty text,
 railroadcrossing text,
 roadcondition text,
 roadcontour text,
 roaddescription text,
 roadsurface text,
 safetyequipmenthelmet_one text,
 safetyequipmenthelmet_two text,
 safetyequipmenthelmet_three text,
 safetyequipsystem_one text,
 safetyequipsystem_two text,
 safetyequipsystem_three text,
 safetyequipuse_one text,
 safetyequipuse_two text,
 safetyequipuse_three text,
 speedlimit_one integer,
 speedlimit_two integer,
 speedlimit_three integer,
 street1 integer,
 street2 integer,
 streetname_st1 text,
 streetname_st2 text,
 street_intersection text,
 totvehs integer,
 unitage_one integer,
 unitage_two integer,
 unitage_three integer,
 vehcomb_one text,
 vehcomb_two text,
 vehcomb_three text,
 vehicledefect_one text,
 vehicledefect_two text,
 vehicledefect_three text,
 technicaljudgement text,
 notes text,
 typology text,
 same_dir boolean,
 opp_dir boolean,
 perpen boolean,
 angle boolean,
 notes2 text,
 sw text,
 ww_sw text,
 cw_dwy_alley text,
 day_week text,
 weekday text,
 trail_access text,
 bike_s_veh_s_st_p boolean,
 bike_s_veh_lt_st_od boolean,
 bike_s_veh_rt_st_p boolean,
 bike_s_veh_rt_st_sd boolean,
 bike_s_veh_s_st_sd boolean,
 bike_s_veh_rt_st_ww_p boolean,
 bike_s_veh_s_sw_ww_p boolean,
 bike_s_veh_rt_sw_ww_p boolean,
 highspeed boolean,
 injurycrash boolean,
 id integer NOT NULL DEFAULT nextval('crashes_bike1_id_seq'::regclass),
 road_id1 integer,
 road_id2 integer,
 at_intersection boolean,
 int_id integer,
 CONSTRAINT crashes_bike1_pkey PRIMARY KEY (id)
);

CREATE INDEX idx_crashbike1bsvlso
 ON received.crashes_bike1
 USING btree
 (bike_s_veh_lt_st_od);

CREATE INDEX idx_crashbike1bsvrsp
 ON received.crashes_bike1
 USING btree
 (bike_s_veh_rt_st_p);

CREATE INDEX idx_crashbike1bsvrssd
 ON received.crashes_bike1
 USING btree
 (bike_s_veh_rt_st_sd);

CREATE INDEX idx_crashbike1bsvrswp
 ON received.crashes_bike1
 USING btree
 (bike_s_veh_rt_sw_ww_p);

CREATE INDEX idx_crashbike1bsvssp
 ON received.crashes_bike1
 USING btree
 (bike_s_veh_s_st_p);

CREATE INDEX idx_crashbike1bsvsstsd
 ON received.crashes_bike1
 USING btree
 (bike_s_veh_s_st_sd);

CREATE INDEX idx_crashbike1bsvsswwwp
 ON received.crashes_bike1
 USING btree
 (bike_s_veh_s_sw_ww_p);

CREATE INDEX idx_crashbike1inj
 ON received.crashes_bike1
 USING btree
 (injurycrash);

CREATE INDEX idx_crashbike1int
 ON received.crashes_bike1
 USING btree
 (int_id);

自行車2表:

CREATE TABLE received.crashes_bike2
(
 accidentdate date,
 accidenttime time without time zone,
 adverseweather text,
 appovertaketurn text,
 caseid text,
 constructionzone text,
 contribfact_one text,
 contribfact_two text,
 contribfact_three text,
 dirfromint text,
 diroftravel_one text,
 diroftravel_two text,
 diroftravel_three text,
 disabled_st1 text,
 disabled_st2 text,
 driveraction_one text,
 driveraction_two text,
 enteredby text,
 entereddate text,
 estvehspeed_one integer,
 estvehspeed_two integer,
 estvehspeed_three integer,
 feetfromint integer,
 firstharmful text,
 mostharmful text,
 secondharmful text,
 internamedir text,
 lightingcondition text,
 location text,
 masterid integer,
 node integer,
 numberinjured integer,
 numberoffatalities integer,
 roadcontour text,
 roaddescription text,
 roadsurface text,
 rownum integer,
 speedlimit_one integer,
 speedlimit_two integer,
 speedlimit_three integer,
 street1 integer,
 street2 integer,
 streetname_st1 text,
 streetname_st2 text,
 totvehs integer,
 unitage_one integer,
 unitage_two integer,
 unitage_three integer,
 vehcomb_one text,
 vehcomb_two text,
 vehcomb_three text,
 unittype_one text,
 unittype_two text,
 unittype_three text,
 movement_one text,
 movement_two text,
 movement_three text,
 circumstance text,
 sw text,
 othercw text,
 motoristplacement text,
 relationshipofplacements text,
 wwswriding text,
 bicyclelane text,
 dooring text,
 bicyclewwstreetriding text,
 crashmonth integer,
 crashday text,
 hour time without time zone,
 ridinglocation text,
 crashyear integer,
 injurycrash boolean,
 fatalcrash boolean,
 noinjuryfatality boolean,
 unit1_veh boolean,
 unit2_veh boolean,
 unit1_bike boolean,
 unit2_bike boolean,
 intdistance_ft text,
 hrgrp text,
 bicyclist boolean,
 bikeaction_one text,
 bikeaction_two text,
 newdriveraction_one text,
 newdriveraction_two text,
 bikeaction_one2 text,
 bikeaction_two2 integer,
 newbikeaction text,
 newdriveraction_one2 text,
 newdriveraction_two2 integer,
 newdriveraction2 text,
 bike_movement text,
 driver_movement text,
 unit1 text,
 unit2 text,
 unit3 text,
 complex boolean,
 nobike boolean,
 bike_movement2 text,
 bike_fault boolean,
 driver_movement2 text,
 crashtype text,
 relationship text,
 ww boolean,
 xwalk boolean,
 bikelane text,
 sidewalk boolean,
 location2 text,
 direction text,
 newcrashtype text,
 bike_s_veh_s_st_p boolean,
 bike_s_veh_lt_st_od boolean,
 bike_s_veh_rt_st_p boolean,
 bike_s_veh_rt_st_sd boolean,
 bike_s_veh_s_st_sd boolean,
 bike_s_veh_rt_st_ww_p boolean,
 highspeed boolean,
 atfault text,
 influence boolean,
 distracted_driverfault boolean,
 aggressive_driverfault boolean,
 inexperience_bikerfault boolean,
 aggressive_bikerfault boolean,
 failyield_driverfault boolean,
 carereckless_driverfault boolean,
 disregardsignal_driverfault boolean,
 failyield_bikerfault boolean,
 disregardsignal_bikerfault boolean,
 allothercrashtype boolean,
 bike_s_veh_rt_sw_ww_p boolean,
 bike_s_veh_s_sw_ww_p boolean,
 id integer NOT NULL DEFAULT nextval('crashes_bike2_id_seq'::regclass),
 road_id1 integer,
 road_id2 integer,
 at_intersection boolean,
 int_id integer,
 CONSTRAINT crashes_bike2_pkey PRIMARY KEY (id)
);

CREATE INDEX idx_crashbike2aggbkflt
 ON received.crashes_bike2
 USING btree
 (aggressive_bikerfault);

CREATE INDEX idx_crashbike2bsvhrtstsd
 ON received.crashes_bike2
 USING btree
 (bike_s_veh_rt_st_sd);

CREATE INDEX idx_crashbike2bsvlso
 ON received.crashes_bike2
 USING btree
 (bike_s_veh_lt_st_od);

CREATE INDEX idx_crashbike2bsvrsp
 ON received.crashes_bike2
 USING btree
 (bike_s_veh_rt_st_p);

CREATE INDEX idx_crashbike2bsvrswp
 ON received.crashes_bike2
 USING btree
 (bike_s_veh_rt_sw_ww_p);

CREATE INDEX idx_crashbike2bsvssp
 ON received.crashes_bike2
 USING btree
 (bike_s_veh_s_st_p);

CREATE INDEX idx_crashbike2bsvsstsd
 ON received.crashes_bike2
 USING btree
 (bike_s_veh_s_st_sd);

CREATE INDEX idx_crashbike2bsvsswwwp
 ON received.crashes_bike2
 USING btree
 (bike_s_veh_s_sw_ww_p);

CREATE INDEX idx_crashbike2carreckdrv
 ON received.crashes_bike2
 USING btree
 (carereckless_driverfault);

CREATE INDEX idx_crashbike2drvflt
 ON received.crashes_bike2
 USING btree
 (aggressive_driverfault);

CREATE INDEX idx_crashbike2dsrgdsgndrv
 ON received.crashes_bike2
 USING btree
 (disregardsignal_driverfault);

CREATE INDEX idx_crashbike2dsrgsgnbk
 ON received.crashes_bike2
 USING btree
 (disregardsignal_bikerfault);

CREATE INDEX idx_crashbike2dstdrv
 ON received.crashes_bike2
 USING btree
 (distracted_driverfault);

CREATE INDEX idx_crashbike2dui
 ON received.crashes_bike2
 USING btree
 (influence);

CREATE INDEX idx_crashbike2flyldbik
 ON received.crashes_bike2
 USING btree
 (failyield_bikerfault);

CREATE INDEX idx_crashbike2flylddrv
 ON received.crashes_bike2
 USING btree
 (failyield_driverfault);

CREATE INDEX idx_crashbike2ftl
 ON received.crashes_bike2
 USING btree
 (fatalcrash);

CREATE INDEX idx_crashbike2hispd
 ON received.crashes_bike2
 USING btree
 (highspeed);

CREATE INDEX idx_crashbike2inj
 ON received.crashes_bike2
 USING btree
 (injurycrash);

CREATE INDEX idx_crashbike2int
 ON received.crashes_bike2
 USING btree
 (int_id);

揭穿

如果您在同一個事務中進行所有更新,則每個更新都必須處理越來越大的(物理)元組集。請參見以下範例:

CREATE TABLE explode (id integer, something text);

INSERT INTO explode SELECT i, md5(i::text) FROM generate_series(1, 100000) t(i);

\dt+ explode -- done in psql
                    List of relations
Schema │  Name   │ Type  │ Owner  │  Size   │ Description 
────────┼─────────┼───────┼────────┼─────────┼─────────────
test   │ explode │ table │ avaczi │ 6704 kB │ 

BEGIN;

UPDATE explode SET something = something || 'a';

\dt+ explode 
test   │ explode │ table │ avaczi │ 13 MB │ 

UPDATE explode SET something = something || 'a';

\dt+ explode 
test   │ explode │ table │ avaczi │ 20 MB │ 

COMMIT;

即使不同的更新不在同一個事務中,您也可能會看到這一點。如果 autovacuum(或手動 VACUUM)跟不上變化的速度,就會出現這種情況。

原因是 PostgreSQL 中的 MVCC 是如何工作的。在進行更新時,它會使用新值創建一個新的物理行,將舊的行標記為對在目前行之後開始的事務不可見。這意味著一旦事務送出,物理級別上的行版本就會“浪費空間”。這些最終必須通過(自動)吸塵來釋放。

現在,當您在事務中時,不清楚它是送出還是回滾,因此數據庫必須保留所有過時的行版本,以便可以恢復到原始行版本。(好吧,不是全部,似乎有一些優化,但肯定有一些優化:在我的實驗中,上面看到的表增長到 200 MB(而不是 6.6)。)這意味著你的表的物理大小會增長並且增長 - 每個步驟的大小取決於您在UPDATE語句中的具體操作。

這是您可以大大改善您的流程的地方。目前,您有許多無條件更改整個表的更新,這實際上意味著在每次迭代中,您的表大小是前一次的兩倍。經過 46 輪(或 23 輪)後,即使是相對較小的桌子也可以變得非常大。

因此,對保留的所有版本進行操作所需的時間越來越多。在我的執行中,第一次UPDATE在 190-200 毫秒內完成,第 20 次迭代達到 400 毫秒。由於您有很好的寬表,並且單獨執行時更新已經花費了 30-40 秒,您可以預期會出現嚴重的減速。

(可能的)解決方案

由於您的更新看起來非常相似,並且它們只會影響同一張表的不同列,您可以嘗試類似

UPDATE  generated.crash_aggregates
SET     bike_driver_aggressive = (
           SELECT  COUNT(*)
           FROM    crashes_bike2 c
           WHERE   c.int_id = crash_aggregates.int_id
           AND     c.aggressive_driverfault
       ),
       bike_allinjury = (
           SELECT  COUNT(*)
           FROM    crashes_bike1 c
           WHERE   c.int_id = crash_aggregates.int_id
           AND     c.injurycrash
       ) + (
           SELECT  COUNT(*)
           FROM    crashes_bike2 c
           WHERE   c.int_id = crash_aggregates.int_id
           AND     c.injurycrash
       ),
       [...];

這意味著 CPU 將不得不考慮所有的因素,但你為他們付出了代價來完成他們的工作。同時,表只會被重寫一次。

完成所有這些後,似乎只計算一次排名就足夠了,方法是類似地編寫一個UPDATE查詢以僅在一輪內更新它們。

改進更多

好吧,以上UPDATE(正如評論中所建議的)遠非有史以來最有效的。這是對它進行更多改進的方法。

因此,看起來,計數是一一計算的,這意味著兩個源表將被訪問多次。根據它們的大小(和其他一些因素),這可能很糟糕。在您的情況下,它保持在疼痛門檻值以下,否則您也已經抱怨過;)在其他情況下,可能並非如此。

這個想法是可以在一次執行中收集所有聚合,並將其用作更新的源。為此,我們可以建構一個大的聚合結構:

SELECT int_id,
      sum(c1.injurycrash::integer) + sum(c2.injurycrash::integer), 
      sum(aggressive_driverfault::integer),
      ...
 FROM crashes_bike1 AS c1, crashes_bike2 AS c2
GROUP BY int_id;

在這裡,我假設不同標誌(如injurycrash)的分佈使得TRUE值不是很罕見。如果是這樣,那麼對整個crashes_bike表的一次大掃描(嗯,他們兩個)可能比許多索引(僅)掃描更糟糕。但是,我沒有看到您將所有內容都編入索引(這可能沒有意義 - 沒有實際數據很難判斷)。

我已經在上面的聚合中替換count()了 for sum()。訣竅是將布爾值轉換為整數並將它們相加,因此避免了一組非常複雜的CASE表達式。

一旦我們有了上述結果集,我們就可以將其插入UPDATE自身:

WITH aggregates1 AS (
   SELECT int_id,
          sum(injurycrash::integer), 
          sum(aggressive_driverfault::integer) AS aggressive_driverfault,
          ...
     FROM crashes_bike1
    GROUP BY int_id
), aggregates2 AS (
   SELECT sum(injurycrash::integer) AS injurycrash,
          ...
     FROM crashes_bike2
    GROUP BY int_id
)
UPDATE crash_aggregates AS ca
  SET bike_allinjury = a1.injurycrash + a2.injurycrash,
      bike_driver_aggressive = a.aggressive_driverfault,
      ...
 FROM aggregates1 AS a1
 FULL JOIN aggregates2 AS a2 USING (int_id)
WHERE ca.int_id = a1.int_id OR ca.int_id = a2.int_id;

引用自:https://dba.stackexchange.com/questions/156616