Postgresql

以元素方式應用於數組時選擇數據集生成函式結果的聯合

  • April 7, 2017

這個問題有點學術上的好奇。我正在嘗試實現一個 SQL(即不是 plpgSQL)函式,該函式接受一組輸入數據,使用其他一些黑盒函式將每個條目轉換為一組零或多行,然後返回這些的連接或併集結果數據集。

我有一個函式,它接受一個值並返回零個或多個記錄:

f: x -> y[]

我有另一個函式,它應該將一組x值作為輸入,並且應該應用於f每個元素,返回所有返回記錄集的聯合:

g: x[] -> y[], returning union or concatenation of { f(x) for each x in x[] }

最初,我不在乎結果集是否包含重複項,儘管如果不包含重複項會更好。

我考慮使用“遞歸”CTE 來迭代數組。使用具有樹結構的典型 CTE 範例:

create table node(id int primary key, parent_id int);

insert into node values                                                                                                                                                          
   (0, null),                                                                                                                                                                   
   (1, 0),                                                                                                                                                                      
   (2, 0),                                                                                                                                                                      
   (3, 2),                                                                                                                                                                      
   (4, 0),                                                                                                                                                                      
   (5, 1),                                                                                                                                                                      
   (6, 1),                                                                                                                                                                      
   (7, 1),                                                                                                                                                                      
   (8, 2),                                                                                                                                                                      
   (9, 3),                                                                                                                                                                      
   (10, 3);                                                                                                                                                                     

create function f(p_id int)                                                                                                                                                      
returns table (id int, parent_id int)                                                                                                                                            
as $$                                                                                                                                                                            
   select * from node where node.parent_id = p_id;                                                                                                                              
$$ language sql stable;                                                                                                                                                          

create function g(p_ids int[])                                                                                                                                                   
returns table (id int, parent_id int, x int)                                                                                                                                     
as $$                                                                                                                                                                            
   with recursive res(id, parent_id, i) as (                                                                                                                                    
       select null::int, null::int, array_lower(p_ids, 1)                                                                                                                       
       union all                                                                                                                                                                
       select tmp.id, tmp.parent_id, res.i + 1                                                                                                                                  
       from res, f(p_ids[res.i]) as tmp                                                                                                                                         
       where res.i <= array_upper(p_ids, 1)                                                                                                                                     
   ) select res.* from res where id is not null;                                                                                                                                
$$ language sql stable;                                                                                                                                                          

select * from f(2);                                                                                                                                                              

select * from g(ARRAY[2, 1]);  

union這行得通,我可以通過/控制是否要重複union all,但我假設顯式迭代會帶來優化障礙,如果data很大,數組很長,並且函式f(x)很愚蠢,這可能會很糟糕簡單如:

select a, b, c from node where node.id = x

在這種情況下,整個查詢可以被優化為相當於一個簡單的:

select a, b, c from node where node.id = any (xs)

但大概不可能是由於計劃者無法意識到我們的迭代 CTE 部分可以轉換為集合操作。

我決定不要偷懶並測試一下:

create function f2(p_ids int[])                                                                                                                                                  
returns table (id int, parent_id int)                                                                                                                                            
as $$                                                                                                                                                                            
   select * from node where node.parent_id = ANY ( p_ids );                                                                                                                     
$$ language sql stable;

然後可能是我最喜歡的 Postgres 功能:

explain analyze select * from f(2);

Seq Scan on node  (cost=0.00..38.25 rows=11 width=8) (actual time=0.003..0.004 rows=2 loops=1)
  Filter: (parent_id = 2)
  Rows Removed by Filter: 9
Planning time: 0.051 ms
Execution time: 0.012 ms
(5 rows)

explain analyze select * from f2(ARRAY[2, 1]);

Seq Scan on node  (cost=0.00..38.25 rows=23 width=8) (actual time=0.004..0.005 rows=5 loops=1)
  Filter: (parent_id = ANY ('{2,1}'::integer[]))
  Rows Removed by Filter: 6
Planning time: 0.055 ms
Execution time: 0.013 ms
(5 rows)

explain analyze select * from g(ARRAY[2, 1]);

CTE Scan on res  (cost=424.46..431.28 rows=339 width=12) (actual time=0.026..0.046 rows=8 loops=1)
  Filter: (id IS NOT NULL)
  Rows Removed by Filter: 1
  CTE res
    ->  Recursive Union  (cost=0.00..424.46 rows=341 width=12) (actual time=0.002..0.042 rows=9 loops=1)
          ->  Result  (cost=0.00..0.01 rows=1 width=12) (actual time=0.000..0.000 rows=1 loops=1)
          ->  Hash Join  (cost=0.26..41.76 rows=34 width=12) (actual time=0.010..0.011 rows=3 loops=3)
                Hash Cond: (node.parent_id = ('{2,1}'::integer[])[res_1.i])
                ->  Seq Scan on node  (cost=0.00..32.60 rows=2260 width=8) (actual time=0.002..0.003 rows=11 loops=2)
                ->  Hash  (cost=0.22..0.22 rows=3 width=4) (actual time=0.002..0.002 rows=1 loops=3)
                      Buckets: 1024  Batches: 1  Memory Usage: 8kB
                      ->  WorkTable Scan on res res_1  (cost=0.00..0.22 rows=3 width=4) (actual time=0.001..0.001 rows=1 loops=3)
                            Filter: (i <= 2)
                            Rows Removed by Filter: 2
Planning time: 0.255 ms
Execution time: 0.083 ms
(16 rows)

有沒有更好的方法來實現這一點select-many,理想情況下是一種不會出現優化障礙的純基於集合的方法?

我知道parent_id如果表更大,索引會有所幫助,但是我更感興趣的是如何以不同的方式表達迭代查詢以提高性能。

這可以通過unnest或修改原始數組返回函式以返回table/setof來解決。

然後可以在join.

引用自:https://dba.stackexchange.com/questions/167386