Postgresql
具有交叉(橫向)連接的多個聚合
繼結合 array_agg 和 unnest之後,給定以下數據:
key | a | b | c ------------------ 1 | 0 | 1 | {1,2} 1 | 1 | 2 | {3} 1 | -1 | 3 | {2} 1 | 2 | 4 | {}
執行此查詢:
SELECT d.key, min(d.a) AS a, sum(d.b) AS b, array_agg(DISTINCT x.c) AS c FROM data AS d CROSS JOIN LATERAL unnest(d.c) AS x(c) GROUP BY d.key
給出了意想不到的結果:
key | a | b | c ------------------ 1 | -1 | 7 | {1,2,3}
這裡發生了什麼,以及如何獲得正確的總和?
答案表現
根據我的實際數據(16642 行、1942 個鍵、6 個聚合),我得到了每個建議解決方案的這些成本估算。
- a_horse_with_no_name選項 1:
1761.12..49370.52
- a_horse_with_no_name選項 2:
0.57..89214.72
- 歐文·布蘭德施泰特:
1761.12..49370.61
取消嵌套為 (key = 1, a = 0, b = 1) 生成 2 行,並且交叉連接刪除具有空數組的行。
因此,您的 group by 對以下集合進行操作:
key | a | b | c ----+----+---+-- 1 | 0 | 1 | 1 1 | 0 | 1 | 2 1 | 1 | 2 | 3 1 | -1 | 3 | 2
一種解決方案是組合兩個分組查詢,每個查詢分組不同級別:
select * from ( select d1."key", min(d1.a), sum(d1.b) from data d1 group by d1."key" ) m join ( select "key", array_agg(DISTINCT x.c) AS c from data d2 left join lateral unnest(d2.c) as x(c) on true where x.c is not null group by "key" ) a on a."key" = m."key";
另一種方法是僅在聚合中包含每個“未嵌套”組的“第一行”:
select d."key", min(d.a) filter (where idx = 1 or idx is null), sum(d.b) filter (where idx = 1 or idx is null), array_agg(distinct x.c) from data AS d left join lateral unnest(d.c) with ordinality AS x(c,idx) on true group by d."key";
with ordinality
返回原始數組中未嵌套元素的位置。對於具有空數組的行,它將為空。