Postgresql

具有交叉(橫向)連接的多個聚合

  • July 4, 2019

結合 array_agg 和 unnest之後,給定以下數據:

key |  a | b | c
------------------
1  |  0 | 1 | {1,2}
1  |  1 | 2 | {3}
1  | -1 | 3 | {2}
1  |  2 | 4 | {}

執行此查詢:

SELECT 
 d.key,
 min(d.a) AS a,
 sum(d.b) AS b,
 array_agg(DISTINCT x.c) AS c 
FROM data AS d
 CROSS JOIN LATERAL unnest(d.c) AS x(c)
GROUP BY d.key

給出了意想不到的結果:

key |  a | b | c
------------------
1  | -1 | 7 | {1,2,3}

這裡發生了什麼,以及如何獲得正確的總和?


答案表現

根據我的實際數據(16642 行、1942 個鍵、6 個聚合),我得到了每個建議解決方案的這些成本估算。

取消嵌套為 (key = 1, a = 0, b = 1) 生成 2 行,並且交叉連接刪除具有空數組的行。

因此,您的 group by 對以下集合進行操作:

key | a  | b | c
----+----+---+--
 1 |  0 | 1 | 1
 1 |  0 | 1 | 2
 1 |  1 | 2 | 3
 1 | -1 | 3 | 2

一種解決方案是組合兩個分組查詢,每個查詢分組不同級別:

select *
from (
 select d1."key", min(d1.a), sum(d1.b)
 from data d1
 group by d1."key"
) m
 join (
   select "key", array_agg(DISTINCT x.c) AS c 
   from data d2
     left join lateral unnest(d2.c) as x(c) on true
   where x.c is not null
   group by "key"
 ) a on a."key" = m."key";

另一種方法是僅在聚合中包含每個“未嵌套”組的“第一行”:

select d."key", 
      min(d.a) filter (where idx = 1 or idx is null), 
      sum(d.b) filter (where idx = 1 or idx is null),
      array_agg(distinct x.c)
from data AS d
 left join lateral unnest(d.c) with ordinality  AS x(c,idx) on true
group by d."key";

with ordinality返回原始數組中未嵌套元素的位置。對於具有空數組的行,它將為空。

引用自:https://dba.stackexchange.com/questions/242101