Postgresql
以元素方式應用於數組時選擇數據集生成函式結果的聯合
這個問題有點學術上的好奇。我正在嘗試實現一個 SQL(即不是 plpgSQL)函式,該函式接受一組輸入數據,使用其他一些黑盒函式將每個條目轉換為一組零或多行,然後返回這些的連接或併集結果數據集。
我有一個函式,它接受一個值並返回零個或多個記錄:
f: x -> y[]
我有另一個函式,它應該將一組
x
值作為輸入,並且應該應用於f
每個元素,返回所有返回記錄集的聯合:g: x[] -> y[], returning union or concatenation of { f(x) for each x in x[] }
最初,我不在乎結果集是否包含重複項,儘管如果不包含重複項會更好。
我考慮使用“遞歸”CTE 來迭代數組。使用具有樹結構的典型 CTE 範例:
create table node(id int primary key, parent_id int); insert into node values (0, null), (1, 0), (2, 0), (3, 2), (4, 0), (5, 1), (6, 1), (7, 1), (8, 2), (9, 3), (10, 3); create function f(p_id int) returns table (id int, parent_id int) as $$ select * from node where node.parent_id = p_id; $$ language sql stable; create function g(p_ids int[]) returns table (id int, parent_id int, x int) as $$ with recursive res(id, parent_id, i) as ( select null::int, null::int, array_lower(p_ids, 1) union all select tmp.id, tmp.parent_id, res.i + 1 from res, f(p_ids[res.i]) as tmp where res.i <= array_upper(p_ids, 1) ) select res.* from res where id is not null; $$ language sql stable; select * from f(2); select * from g(ARRAY[2, 1]);
union
這行得通,我可以通過/控制是否要重複union all
,但我假設顯式迭代會帶來優化障礙,如果data
很大,數組很長,並且函式f(x)
很愚蠢,這可能會很糟糕簡單如:select a, b, c from node where node.id = x
在這種情況下,整個查詢可以被優化為相當於一個簡單的:
select a, b, c from node where node.id = any (xs)
但大概不可能是由於計劃者無法意識到我們的迭代 CTE 部分可以轉換為集合操作。
–
我決定不要偷懶並測試一下:
create function f2(p_ids int[]) returns table (id int, parent_id int) as $$ select * from node where node.parent_id = ANY ( p_ids ); $$ language sql stable;
然後可能是我最喜歡的 Postgres 功能:
explain analyze select * from f(2); Seq Scan on node (cost=0.00..38.25 rows=11 width=8) (actual time=0.003..0.004 rows=2 loops=1) Filter: (parent_id = 2) Rows Removed by Filter: 9 Planning time: 0.051 ms Execution time: 0.012 ms (5 rows) explain analyze select * from f2(ARRAY[2, 1]); Seq Scan on node (cost=0.00..38.25 rows=23 width=8) (actual time=0.004..0.005 rows=5 loops=1) Filter: (parent_id = ANY ('{2,1}'::integer[])) Rows Removed by Filter: 6 Planning time: 0.055 ms Execution time: 0.013 ms (5 rows) explain analyze select * from g(ARRAY[2, 1]); CTE Scan on res (cost=424.46..431.28 rows=339 width=12) (actual time=0.026..0.046 rows=8 loops=1) Filter: (id IS NOT NULL) Rows Removed by Filter: 1 CTE res -> Recursive Union (cost=0.00..424.46 rows=341 width=12) (actual time=0.002..0.042 rows=9 loops=1) -> Result (cost=0.00..0.01 rows=1 width=12) (actual time=0.000..0.000 rows=1 loops=1) -> Hash Join (cost=0.26..41.76 rows=34 width=12) (actual time=0.010..0.011 rows=3 loops=3) Hash Cond: (node.parent_id = ('{2,1}'::integer[])[res_1.i]) -> Seq Scan on node (cost=0.00..32.60 rows=2260 width=8) (actual time=0.002..0.003 rows=11 loops=2) -> Hash (cost=0.22..0.22 rows=3 width=4) (actual time=0.002..0.002 rows=1 loops=3) Buckets: 1024 Batches: 1 Memory Usage: 8kB -> WorkTable Scan on res res_1 (cost=0.00..0.22 rows=3 width=4) (actual time=0.001..0.001 rows=1 loops=3) Filter: (i <= 2) Rows Removed by Filter: 2 Planning time: 0.255 ms Execution time: 0.083 ms (16 rows)
有沒有更好的方法來實現這一點
select-many
,理想情況下是一種不會出現優化障礙的純基於集合的方法?我知道
parent_id
如果表更大,索引會有所幫助,但是我更感興趣的是如何以不同的方式表達迭代查詢以提高性能。
這可以通過
unnest
或修改原始數組返回函式以返回table
/setof
來解決。然後可以在
join
.