Postgresql

對序列中的分類值進行分組

  • February 6, 2018

我有一個按順序包含分類值的表,如下所示:

CREATE TABLE cat (name, v1, v2)
AS VALUES
   ('John', 1::int, 3::int),
   ('John', 3, 4),
   ('John', 4, 9),
   ('Mike', 9, 11),
   ('Mike', 11, 17),
   ('John', 17, 24),
   ('John', 24, 25),
   ('Dave', 25, 29);

我想按名稱列對它們進行分組,得到最小值_1和最大值_2,但是將它們分組為名稱以不間斷的順序出現,如下所示:

+-----------+---------+---------+
| name      | value_1 | value_2 |
+-----------+---------+---------+
| John      | 1       | 9       |
| Mike      | 9       | 17      |
| John      | 17      | 25      |
| Dave      | 25      | 29      |
+-----------+---------+---------+

但是,進行正常的 GROUP BY 選擇會同時返回特定名稱的所有條目,無論它們是否一起出現:

SELECT name, MIN(value_1), MAX(value_2)
FROM table
GROUP BY name;

+-----------+---------+---------+
| name      | value_1 | value_2 |
+-----------+---------+---------+
| John      | 1       | 25      |
| Mike      | 9       | 17      |
| Dave      | 25      | 29      |
+-----------+---------+---------+

現在,我可能可以使用 plpgsql 執行此操作,使用帶有變數、循環等的傳統程式邏輯,但我想知道是否有更多 SQL 風格的方法來解決這個問題。

軟體:PostgreSQL 9.6.3

使用value_1對行進行排序,您可以在每次名稱更改時進行分組。

然後使用這個組來獲取maxmin值。

select name, min(v1) v1, max(v2) v2
from (select name, v1, v2,
             sum(rst) over (order by v1) grp
      from (select name, v1, v2,
                   case when coalesce(lag(name) over (order by v1), '') <> name then 1 end rst
            from cat
            ) y
      ) z
group by name, grp
order by v1;
姓名 | v1 | v2
:--- | -: | -:
約翰 | 1 | 9
邁克 | 9 | 17
約翰 | 17 | 25
戴夫 | 25 | 29

dbfiddle在這裡

我讀到這個不同。我假設範圍是鍊式的。例如

bob, 2, 5
-- no bob 5,7
bob, 7,  10
bob, 11, 12

應該導致兩個不同的範圍。為此有一個問題,您的架構是否允許重疊範圍?如果不是,我的假設,這應該有效。警告,如果您可以有上述範圍,這將不起作用bob, 0, 10

在這裡,我們計算與另一列滯後的重置。

SELECT name, v1, v2, COALESCE(v1<>lag(v2) OVER (
 PARTITION BY name
 ORDER BY v1, v2
), true) AS rst
FROM cat;
name | v1 | v2 | rst 
------+----+----+-----
Dave | 25 | 29 | t
John |  1 |  3 | t
John |  3 |  4 | f
John |  4 |  9 | f
John | 17 | 24 | t
John | 24 | 25 | f
Mike |  9 | 11 | t
Mike | 11 | 17 | f
(8 rows)

然後我們數得到組(grp),

SELECT name, v1, v2, count(rst OR null) OVER (ORDER BY name, v1, v2) AS grp
FROM (
       SELECT name, v1, v2, COALESCE(v1<>lag(v2) OVER (
               PARTITION BY name
               ORDER BY v1, v2
       ), true) AS rst
       FROM cat
) AS t;
name | v1 | v2 | grp 
------+----+----+-----
Dave | 25 | 29 |   1
John |  1 |  3 |   2
John |  3 |  4 |   2
John |  4 |  9 |   2
John | 17 | 24 |   3
John | 24 | 25 |   3
Mike |  9 | 11 |   4
Mike | 11 | 17 |   4
(8 rows)

然後我們按grp列分組以完成它。

SELECT name, min(v1), max(v2)
FROM (
       SELECT name, v1, v2, count(rst OR null) OVER (ORDER BY name, v1, v2) AS grp
       FROM (
               SELECT name, v1, v2, COALESCE(v1<>lag(v2) OVER (
                       PARTITION BY name
                       ORDER BY v1, v2
               ), true) AS rst
               FROM cat
       ) AS t1
) AS t2
GROUP BY grp, name;
name | min | max 
------+-----+-----
Dave |  25 |  29
John |   1 |   9
John |  17 |  25
Mike |   9 |  17
(4 rows)

行為差異..

INSERT INTO cat (name, v1, v2)
VALUES ('Dave', 31, 35);

name | min | max 
------+-----+-----
Dave |  25 |  29
Mike |   9 |  17
Dave |  31 |  35
John |   1 |   9
John |  17 |  25
(5 rows)

引用自:https://dba.stackexchange.com/questions/197111