Postgresql

當其他子查詢使用非並行聚合時,為什麼子查詢不使用並行聚合?

  • May 10, 2022

我定義了 2 個等效聚合。第一個支持並行模式,但第二個不支持。我用索引定義了 2 個相同的表。

create table jsfield(j jsonb);
insert into jsfield 
 select jsonb_build_object('c', s) 
  from generate_series(1, 100000) s; 
create index jsfield_idx on jsfield(j);

create table jsfield2 as select * from jsfield;
create index jsfield2_idx on jsfield2(j);

create or replace function shallow_jsonb_object_merge(st jsonb, a jsonb) 
 returns jsonb 
 immutable 
 language sql
 as $$ select st || a ; $$ ;

create or replace aggregate "Semigroup_seq" (jsonb)
(
 sfunc = shallow_jsonb_object_merge,
 stype = jsonb
);

create or replace aggregate "Semigroup_par" (jsonb)
(
 sfunc = shallow_jsonb_object_merge,
 stype = jsonb,
 combinefunc = shallow_jsonb_object_merge,
 parallel = safe,
 initcond = '{}'
);

當單獨使用聚合時,explain analyze 會產生預期的計劃:

explain analyze select "Semigroup_seq"(j) from jsfield ;
                                                    QUERY PLAN                                                      
---------------------------------------------------------------------------------------------------------------------
Aggregate  (cost=26737.04..26737.05 rows=1 width=32) (actual time=122.797..122.797 rows=1 loops=1)
  ->  Seq Scan on jsfield  (cost=0.00..1736.04 rows=100004 width=26) (actual time=0.015..6.934 rows=100004 loops=1)
Planning Time: 0.420 ms
Execution Time: 122.847 ms


explain analyze select "Semigroup_par"(j) from jsfield2 ;
                                                                          QUERY PLAN                                                                            
---------------------------------------------
Finalize Aggregate  (cost=10559.84..10559.85 rows=1 width=32) (actual time=61.475..64.994 rows=1 loops=1)
  ->  Gather  (cost=10558.78..10559.09 rows=3 width=32) (actual time=61.363..64.884 rows=4 loops=1)
        Workers Planned: 3
        Workers Launched: 3
        ->  Partial Aggregate  (cost=9558.78..9558.79 rows=1 width=32) (actual time=42.316..42.316 rows=1 loops=4)
              ->  Parallel Index Only Scan using jsfield2_idx on jsfield2  (cost=0.42..1494.03 rows=32259 width=26) (actual time=0.030..2.781 rows=25001 loops=4)
                    Heap Fetches: 0
Planning Time: 0.363 ms
Execution Time: 65.065 ms

並行聚合執行肯定更快,但是當我將兩個查詢作為子查詢同時執行時,解釋顯示兩個查詢的順序掃描!

explain analyze select 
 (select "Semigroup_par"(j) from jsfield2), 
 (select "Semigroup_seq"(j) from jsfield) ;
                                                         QUERY PLAN                                                          
--------------------------------------------------
Result  (cost=53474.10..53474.11 rows=1 width=64) (actual time=271.377..271.378 rows=1 loops=1)
  InitPlan 1 (returns $0)
    ->  Aggregate  (cost=26737.04..26737.05 rows=1 width=32) (actual time=174.772..174.773 rows=1 loops=1)
          ->  Seq Scan on jsfield2  (cost=0.00..1736.04 rows=100004 width=26) (actual time=0.007..7.586 rows=100004 loops=1)
  InitPlan 2 (returns $1)
    ->  Aggregate  (cost=26737.04..26737.05 rows=1 width=32) (actual time=96.599..96.599 rows=1 loops=1)
          ->  Seq Scan on jsfield  (cost=0.00..1736.04 rows=100004 width=26) (actual time=0.006..5.362 rows=100004 loops=1)
Planning Time: 0.581 ms
Execution Time: 271.407 ms

為什麼規劃器更喜歡 seq 掃描而不是並行掃描?是因為兩個子查詢的成本相同並且並行模式的記憶體要貴得多嗎?雖然刨床如何知道它是否僅由成本參數指導。

這可能是一個錯誤嗎?它是可調節的行為嗎?

          server_version            
-------------------------------------
14.2 (Ubuntu 14.2-1.pgdg22.04+1+b1)

不,這是按預期工作的。如果在查詢中的任何地方使用了並行不安全函式,則該查詢不能使用並行性,句點。用文件的話來說:

當查詢包含任何並行不安全的內容時,該查詢將完全禁用並行查詢。

引用自:https://dba.stackexchange.com/questions/311920