在 Postgres 中的多個列中的任何一個上進行分組

December 11, 2016

是否可以在 Postgres 中創建某種分組鏈？假設我有以下圖表：
CREATE TABLE foo AS
SELECT row_number() OVER () AS id, *
FROM ( VALUES
 ( 'X', 'D', 'G', 'P' ),
 ( 'F', 'D', 'L', 'M' ),
 ( 'X', 'N', 'R', 'S' ),
 ( 'Y', 'I', 'W', NULL ),
 ( 'U', 'Z', 'E', NULL )
) AS f(a,b,c,d);

id | a | b | c | d
------------------
1 | X | D | G | P
2 | F | D | L | M
3 | X | N | R | S
4 | Y | I | W | 
5 | U | Z | E | 
我想以某種方式製作一個GROUP BY產生三個組的：
1,2並3在一起
1並且因為列中2的一個共同點D``b
1並且因為列中3的一個共同點X``a
4單獨（任何列中沒有共同值；空值不應匹配）
5單獨（任何列中沒有共同值；空值不應匹配）
我目前正在使用 Postgres 9.5，但我們最終會升級到 9.6，所以如果那裡有什麼對我有幫助的東西，我很樂意聽到。
換句話說，我正在尋找類似的東西（假設我使用array_agg(DISTINCT a)等來保持顯示更簡單）：
  ids    |     as     |     bs     |       cs        |      ds
-----------------------------------------------------------------------
{1, 2, 3} | {'X', 'F'} | {'D', 'N'} | {'G', 'L', 'R'} | {'P', 'M', 'S'}
{4}       | {'Y'}      | {'I'}      | {'W'}           | {NULL}
{5}       | {'U'}      | {'Z'}      | {'E'}           | {NULL}
（我不完全確定空值將如何顯示，所以不要太在意這一點；重要的是它們不應該相互匹配。）
當我使用時GROUP BY CUBE (a, b, c, d)，我得到的結果不止三個……同上GROUP BY ROLLUP和GROUP BY GROUPING SETS.
Postgres有優雅的方式嗎？我可以想像您將如何通過 Active Record 在 Ruby 中執行此操作（循環遍歷每條記錄，將其與之前匹配的分組集進行分組），但如果可能的話，我想將其保留在 Postgres 中。

另一個遞歸解決方案：

首先創建 ids 的連通圖的鄰接表，
然後找到它的傳遞閉包（這是遞歸部分）
然後按（一次）分組以找到每個節點所屬的連接組件
並再次加入表並按（再次）分組以從每個連接組件的所有節點收集值。

初始數據（從Jack Douglas 的解決方案中複製）：

begin;
create schema stack;
set search_path=stack;

create table foo as
select *
from (values (1,'X','D','G','P')
          , (2,'F','D','L','M')
          , (3,'X','N','R','S')
          , (4,'Y','I','W',null)
          , (5,'U','Z','E',null) ) AS f(id,a,b,c,d);

查詢：

with recursive 
 al (tail, head) as                     -- adjacency list 
 ( select f.id, g.id 
   from foo as f join foo as g
     on (f.a = g.a or f.b = g.b or f.c = g.c or f.d = g.d) 
 ),
 tc (tail, head) as                     -- transitive closure
 ( select * from al
   union distinct
   select f.tail, g.head 
   from al as f join tc as g on f.head = g.tail
 ) ,
 cc (head, ids) as                      -- group once
 ( select head, array_agg(distinct tail order by tail) as ids
   from tc
   group by head
 ) 
select                                   -- group twice
   ids,
   array_agg(distinct a order by a) as a,
   array_agg(distinct b order by b) as b,
   array_agg(distinct c order by c) as c,
   array_agg(distinct d order by d) as d
from
 cc join foo on cc.head = foo.id
group by ids ;

┌─────────┬───────┬───────┬─────────┬─────────┐
│   ids   │   a   │   b   │    c    │    d    │
├─────────┼───────┼───────┼─────────┼─────────┤
│ {1,2,3} │ {F,X} │ {D,N} │ {G,L,R} │ {M,P,S} │
│ {4}     │ {Y}   │ {I}   │ {W}     │ {NULL}  │
│ {5}     │ {U}   │ {Z}   │ {E}     │ {NULL}  │
└─────────┴───────┴───────┴─────────┴─────────┘

清理：

rollback;

假設您尋求通用解決方案，我認為沒有任何非遞歸方法可以解決您的問題。如果您的實際問題需要處理大量行，那麼您可能需要完成工作以獲得足夠好的可擴展性解決方案。

測試模式和數據：

begin;
create schema stack;
set search_path=stack;

create table foo as
select *
from (values (1,'X','D','G','P')
          , (2,'F','D','L','M')
          , (3,'X','N','R','S')
          , (4,'Y','I','W',null)
          , (5,'U','Z','E',null) ) AS f(id,a,b,c,d);

解決方案：

with recursive t(id,a,b,c,d,start,path,cycle) as (
 select *, id, array[id], false from foo
 union all
 select f.*, start, path||f.id, f.id=any(path)
 from foo f join t 
   on f.id&lt;&gt;t.id and
      (f.a=t.a or f.b=t.b or f.c=t.c or f.d=t.d) where not cycle )
select array_agg(f.id order by f.id) ids
    , array_agg(distinct a order by a) a
    , array_agg(distinct b order by b) b
    , array_agg(distinct c order by c) c
    , array_agg(distinct d order by d) d
from foo f join ( select start id, array_agg(id order by id) ids
                 from t
                 where not cycle group by start) z on z.id=f.id
group by ids::text;

┌─────────┬───────┬───────┬─────────┬─────────┐
│   ids   │   a   │   b   │    c    │    d    │
├─────────┼───────┼───────┼─────────┼─────────┤
│ {1,2,3} │ {F,X} │ {D,N} │ {G,L,R} │ {M,P,S} │
│ {4}     │ {Y}   │ {I}   │ {W}     │ {NULL}  │
│ {5}     │ {U}   │ {Z}   │ {E}     │ {NULL}  │
└─────────┴───────┴───────┴─────────┴─────────┘

清理：

rollback;

引用自：https://dba.stackexchange.com/questions/157715

在 Postgres 中的多個列中的任何一個上進行分組

相關問答

取多個可空列的平均值

聚合函式的問題

簡化重複的 INSERT 命令，為未找到的值放置 NULL

如何根據條件對 JSON 數組中的嵌套值求和

當其他子查詢使用非並行聚合時，為什麼子查詢不使用並行聚合？

將 Postgres 中的可變行數加入 json 數組