Postgresql
對序列中的分類值進行分組
我有一個按順序包含分類值的表,如下所示:
CREATE TABLE cat (name, v1, v2) AS VALUES ('John', 1::int, 3::int), ('John', 3, 4), ('John', 4, 9), ('Mike', 9, 11), ('Mike', 11, 17), ('John', 17, 24), ('John', 24, 25), ('Dave', 25, 29);
我想按名稱列對它們進行分組,得到最小值_1和最大值_2,但是將它們分組為名稱以不間斷的順序出現,如下所示:
+-----------+---------+---------+ | name | value_1 | value_2 | +-----------+---------+---------+ | John | 1 | 9 | | Mike | 9 | 17 | | John | 17 | 25 | | Dave | 25 | 29 | +-----------+---------+---------+
但是,進行正常的 GROUP BY 選擇會同時返回特定名稱的所有條目,無論它們是否一起出現:
SELECT name, MIN(value_1), MAX(value_2) FROM table GROUP BY name; +-----------+---------+---------+ | name | value_1 | value_2 | +-----------+---------+---------+ | John | 1 | 25 | | Mike | 9 | 17 | | Dave | 25 | 29 | +-----------+---------+---------+
現在,我可能可以使用 plpgsql 執行此操作,使用帶有變數、循環等的傳統程式邏輯,但我想知道是否有更多 SQL 風格的方法來解決這個問題。
軟體:PostgreSQL 9.6.3
使用
value_1
對行進行排序,您可以在每次名稱更改時進行分組。然後使用這個組來獲取
max
和min
值。select name, min(v1) v1, max(v2) v2 from (select name, v1, v2, sum(rst) over (order by v1) grp from (select name, v1, v2, case when coalesce(lag(name) over (order by v1), '') <> name then 1 end rst from cat ) y ) z group by name, grp order by v1;
姓名 | v1 | v2 :--- | -: | -: 約翰 | 1 | 9 邁克 | 9 | 17 約翰 | 17 | 25 戴夫 | 25 | 29
dbfiddle在這裡
我讀到這個不同。我假設範圍是鍊式的。例如
bob, 2, 5 -- no bob 5,7 bob, 7, 10 bob, 11, 12
應該導致兩個不同的範圍。為此有一個問題,您的架構是否允許重疊範圍?如果不是,我的假設,這應該有效。警告,如果您可以有上述範圍,這將不起作用
bob, 0, 10
,在這裡,我們計算與另一列滯後的重置。
SELECT name, v1, v2, COALESCE(v1<>lag(v2) OVER ( PARTITION BY name ORDER BY v1, v2 ), true) AS rst FROM cat; name | v1 | v2 | rst ------+----+----+----- Dave | 25 | 29 | t John | 1 | 3 | t John | 3 | 4 | f John | 4 | 9 | f John | 17 | 24 | t John | 24 | 25 | f Mike | 9 | 11 | t Mike | 11 | 17 | f (8 rows)
然後我們數得到組(
grp
),SELECT name, v1, v2, count(rst OR null) OVER (ORDER BY name, v1, v2) AS grp FROM ( SELECT name, v1, v2, COALESCE(v1<>lag(v2) OVER ( PARTITION BY name ORDER BY v1, v2 ), true) AS rst FROM cat ) AS t; name | v1 | v2 | grp ------+----+----+----- Dave | 25 | 29 | 1 John | 1 | 3 | 2 John | 3 | 4 | 2 John | 4 | 9 | 2 John | 17 | 24 | 3 John | 24 | 25 | 3 Mike | 9 | 11 | 4 Mike | 11 | 17 | 4 (8 rows)
然後我們按
grp
列分組以完成它。SELECT name, min(v1), max(v2) FROM ( SELECT name, v1, v2, count(rst OR null) OVER (ORDER BY name, v1, v2) AS grp FROM ( SELECT name, v1, v2, COALESCE(v1<>lag(v2) OVER ( PARTITION BY name ORDER BY v1, v2 ), true) AS rst FROM cat ) AS t1 ) AS t2 GROUP BY grp, name; name | min | max ------+-----+----- Dave | 25 | 29 John | 1 | 9 John | 17 | 25 Mike | 9 | 17 (4 rows)
行為差異..
INSERT INTO cat (name, v1, v2) VALUES ('Dave', 31, 35); name | min | max ------+-----+----- Dave | 25 | 29 Mike | 9 | 17 Dave | 31 | 35 John | 1 | 9 John | 17 | 25 (5 rows)