以元素方式應用於數組時選擇數據集生成函式結果的聯合

April 7, 2017

這個問題有點學術上的好奇。我正在嘗試實現一個 SQL（即不是 plpgSQL）函式，該函式接受一組輸入數據，使用其他一些黑盒函式將每個條目轉換為一組零或多行，然後返回這些的連接或併集結果數據集。

我有一個函式，它接受一個值並返回零個或多個記錄：

f: x -&gt; y[]

我有另一個函式，它應該將一組x值作為輸入，並且應該應用於f每個元素，返回所有返回記錄集的聯合：

g: x[] -&gt; y[], returning union or concatenation of { f(x) for each x in x[] }

最初，我不在乎結果集是否包含重複項，儘管如果不包含重複項會更好。

我考慮使用“遞歸”CTE 來迭代數組。使用具有樹結構的典型 CTE 範例：

create table node(id int primary key, parent_id int);

insert into node values                                                                                                                                                          
   (0, null),                                                                                                                                                                   
   (1, 0),                                                                                                                                                                      
   (2, 0),                                                                                                                                                                      
   (3, 2),                                                                                                                                                                      
   (4, 0),                                                                                                                                                                      
   (5, 1),                                                                                                                                                                      
   (6, 1),                                                                                                                                                                      
   (7, 1),                                                                                                                                                                      
   (8, 2),                                                                                                                                                                      
   (9, 3),                                                                                                                                                                      
   (10, 3);                                                                                                                                                                     

create function f(p_id int)                                                                                                                                                      
returns table (id int, parent_id int)                                                                                                                                            
as $$                                                                                                                                                                            
   select * from node where node.parent_id = p_id;                                                                                                                              
$$ language sql stable;                                                                                                                                                          

create function g(p_ids int[])                                                                                                                                                   
returns table (id int, parent_id int, x int)                                                                                                                                     
as $$                                                                                                                                                                            
   with recursive res(id, parent_id, i) as (                                                                                                                                    
       select null::int, null::int, array_lower(p_ids, 1)                                                                                                                       
       union all                                                                                                                                                                
       select tmp.id, tmp.parent_id, res.i + 1                                                                                                                                  
       from res, f(p_ids[res.i]) as tmp                                                                                                                                         
       where res.i &lt;= array_upper(p_ids, 1)                                                                                                                                     
   ) select res.* from res where id is not null;                                                                                                                                
$$ language sql stable;                                                                                                                                                          

select * from f(2);                                                                                                                                                              

select * from g(ARRAY[2, 1]);

union這行得通，我可以通過/控制是否要重複union all，但我假設顯式迭代會帶來優化障礙，如果data很大，數組很長，並且函式f(x)很愚蠢，這可能會很糟糕簡單如：

select a, b, c from node where node.id = x

在這種情況下，整個查詢可以被優化為相當於一個簡單的：

select a, b, c from node where node.id = any (xs)

但大概不可能是由於計劃者無法意識到我們的迭代 CTE 部分可以轉換為集合操作。

–

我決定不要偷懶並測試一下：

create function f2(p_ids int[])                                                                                                                                                  
returns table (id int, parent_id int)                                                                                                                                            
as $$                                                                                                                                                                            
   select * from node where node.parent_id = ANY ( p_ids );                                                                                                                     
$$ language sql stable;

然後可能是我最喜歡的 Postgres 功能：

explain analyze select * from f(2);

Seq Scan on node  (cost=0.00..38.25 rows=11 width=8) (actual time=0.003..0.004 rows=2 loops=1)
  Filter: (parent_id = 2)
  Rows Removed by Filter: 9
Planning time: 0.051 ms
Execution time: 0.012 ms
(5 rows)

explain analyze select * from f2(ARRAY[2, 1]);

Seq Scan on node  (cost=0.00..38.25 rows=23 width=8) (actual time=0.004..0.005 rows=5 loops=1)
  Filter: (parent_id = ANY ('{2,1}'::integer[]))
  Rows Removed by Filter: 6
Planning time: 0.055 ms
Execution time: 0.013 ms
(5 rows)

explain analyze select * from g(ARRAY[2, 1]);

CTE Scan on res  (cost=424.46..431.28 rows=339 width=12) (actual time=0.026..0.046 rows=8 loops=1)
  Filter: (id IS NOT NULL)
  Rows Removed by Filter: 1
  CTE res
    -&gt;  Recursive Union  (cost=0.00..424.46 rows=341 width=12) (actual time=0.002..0.042 rows=9 loops=1)
          -&gt;  Result  (cost=0.00..0.01 rows=1 width=12) (actual time=0.000..0.000 rows=1 loops=1)
          -&gt;  Hash Join  (cost=0.26..41.76 rows=34 width=12) (actual time=0.010..0.011 rows=3 loops=3)
                Hash Cond: (node.parent_id = ('{2,1}'::integer[])[res_1.i])
                -&gt;  Seq Scan on node  (cost=0.00..32.60 rows=2260 width=8) (actual time=0.002..0.003 rows=11 loops=2)
                -&gt;  Hash  (cost=0.22..0.22 rows=3 width=4) (actual time=0.002..0.002 rows=1 loops=3)
                      Buckets: 1024  Batches: 1  Memory Usage: 8kB
                      -&gt;  WorkTable Scan on res res_1  (cost=0.00..0.22 rows=3 width=4) (actual time=0.001..0.001 rows=1 loops=3)
                            Filter: (i &lt;= 2)
                            Rows Removed by Filter: 2
Planning time: 0.255 ms
Execution time: 0.083 ms
(16 rows)

有沒有更好的方法來實現這一點select-many，理想情況下是一種不會出現優化障礙的純基於集合的方法？

我知道parent_id如果表更大，索引會有所幫助，但是我更感興趣的是如何以不同的方式表達迭代查詢以提高性能。

這可以通過unnest或修改原始數組返回函式以返回table/setof來解決。
然後可以在join.

引用自：https://dba.stackexchange.com/questions/167386

以元素方式應用於數組時選擇數據集生成函式結果的聯合

相關問答

CTE 在哪裡放置 where 子句以更快地過濾行（在 postgresql 中）？

選擇匹配的行子集的更好方法？

查詢優化的“質量”是否因數據庫提供商而異？

是否為每一行評估非動態標量子查詢？

可以使用第一個 CTE 作為第二個 CTE 的過濾器嗎？

PostgreSQL樹結構和遞歸CTE優化