Gaps-and-Islands

如何在時間跨度內對數據進行分組

  • May 9, 2018

我有一組按日期排序的行,如下所示:

2017-01-01 
2017-01-01
2017-01-02
2017-01-03
2017-01-04
2017-01-17
2017-01-18
2017-01-18
2017-01-18
2017-01-19
2017-01-19
2017-01-26
2017-01-27
2017-01-27

為願意提供幫助的人提供 DDL 和 DML:

CREATE TABLE foo ( DateCol date );
INSERT INTO foo ( DateCol )
VALUES
 ( '2017-01-01' ),
 ( '2017-01-01' ),
 ( '2017-01-02' ),
 ( '2017-01-03' ),
 ( '2017-01-04' ),
 ( '2017-01-17' ),
 ( '2017-01-18' ),
 ( '2017-01-18' ),
 ( '2017-01-18' ),
 ( '2017-01-19' ),
 ( '2017-01-19' ),
 ( '2017-01-26' ),
 ( '2017-01-27' ),
 ( '2017-01-27' )
);

我正在嘗試對行進行分組並添加一個額外的列來標記該組。標準是,如果它們之間的時間間隔小於 3 天,則任何兩行都應該在一組中,如下所示:

2017-01-01  A
2017-01-01  A
2017-01-02  A
2017-01-03  A
2017-01-04  A
2017-01-17  B
2017-01-18  B
2017-01-18  B
2017-01-18  B
2017-01-19  B
2017-01-19  B
2017-01-26  C
2017-01-27  C
2017-01-27  C

如何做到這一點?

窗函式法

SELECT date,
 chr(
   65 + CAST(count(reset) OVER (ORDER BY date) AS int)
 ) AS dategrp
FROM (
 SELECT date, CASE WHEN date - lag(date) OVER () > 3 THEN 1 END AS reset
 FROM foo
 ORDER BY date
) AS t

首先,我們使用視窗函式生成重置

SELECT date, CASE WHEN date - lag(date) OVER () > 3 THEN 1 END AS reset
FROM foo
ORDER BY date
   date    | reset 
------------+-------
2017-01-01 |      
2017-01-01 |      
2017-01-02 |      
2017-01-03 |      
2017-01-04 |      
2017-01-17 |     1
2017-01-18 |      
2017-01-18 |      
2017-01-18 |      
2017-01-19 |      
2017-01-19 |      
2017-01-26 |     1
2017-01-27 |      
2017-01-27 |      

然後我們生成數字組

SELECT date, count(reset) OVER (ORDER BY date)
FROM (
 SELECT date, CASE WHEN date - lag(date) OVER () > 3 THEN 1 END AS reset
 FROM foo
 ORDER BY date
) AS t
   date    | count 
------------+-------
2017-01-01 |     0
2017-01-01 |     0
2017-01-02 |     0
2017-01-03 |     0
2017-01-04 |     0
2017-01-17 |     1
2017-01-18 |     1
2017-01-18 |     1
2017-01-18 |     1
2017-01-19 |     1
2017-01-19 |     1
2017-01-26 |     2
2017-01-27 |     2
2017-01-27 |     2

然後我們chr(65+)用來獲取您想要的 alpha 組名稱。原始查詢在頂部..

   date    | dategrp 
------------+---------
2017-01-01 | A
2017-01-01 | A
2017-01-02 | A
2017-01-03 | A
2017-01-04 | A
2017-01-17 | B
2017-01-18 | B
2017-01-18 | B
2017-01-18 | B
2017-01-19 | B
2017-01-19 | B
2017-01-26 | C
2017-01-27 | C
2017-01-27 | C
(14 rows)

這可以通過RECURSIVE Common Table Expression和一些帶有滯後 window function的預處理來完成。

首先,假設這是我們的數據:

CREATE TABLE ds
(
   d date NOT NULL
) ;

INSERT INTO 
   ds(d)
VALUES 
   ('2017-01-01'),
   ('2017-01-01'),
   ('2017-01-02'),
   ('2017-01-03'),
   ('2017-01-04'),
   ('2017-01-17'),
   ('2017-01-18'),
   ('2017-01-18'),
   ('2017-01-18'),
   ('2017-01-19'),
   ('2017-01-19'),
   ('2017-01-26'),
   ('2017-01-27'),
   ('2017-01-27') ;

有了這些數據,我們可以使用lag視窗函式“查看上一行”,並判斷距離是否小於 3 天。[在你的規格中。您談論下一個日期,但您的範例按**前一個日期分組]。

SELECT
     d, 
     coalesce(d - lag(d) over(order by d) >= 3, true) AS start_new_group, 
     row_number() over (order by d) AS rn
FROM
     ds
ORDER BY 
     d ;

這是我們在這一點上得到的(中間)結果:

|          d | start_new_group | rn |
|------------|-----------------|----|
| 2017-01-01 |            true |  1 |
| 2017-01-01 |           false |  2 |
| 2017-01-02 |           false |  3 |
| 2017-01-03 |           false |  4 |
| 2017-01-04 |           false |  5 |
| 2017-01-17 |            true |  6 |
| 2017-01-18 |           false |  7 |
| 2017-01-18 |           false |  8 |
| 2017-01-18 |           false |  9 |
| 2017-01-19 |           false | 10 |
| 2017-01-19 |           false | 11 |
| 2017-01-26 |            true | 12 |
| 2017-01-27 |           false | 13 |
| 2017-01-27 |           false | 14 |

現在我們知道何時開始下一組日期,我們可以使用 a RECURSIVE CTE,它還包括上一個查詢(我們稱之為d_start

WITH RECURSIVE 
d_start AS
(SELECT
   d, 
   -- Here is where we decide when to start a new group
   coalesce(d - lag(d) over(order by d) >= 3, true) AS start_new_group, 
   row_number() over (order by d) AS rn  -- So that we can order by and join
FROM
   ds
),
d_group AS
(
SELECT
   d,          rn,
   0 AS group_nr
FROM
   d_start
WHERE
   rn = 1
UNION ALL
SELECT
   d_start.d,  rn+1 AS rn,
   CASE WHEN d_start.start_new_group 
       THEN group_nr+1
       ELSE group_nr
   END AS group_nr
FROM
   d_group   -- We recurse at this point
   JOIN d_start USING(rn)  -- and join the info of d_start               
)
SELECT
   d, group_nr
FROM
   d_group
WHERE
   rn > 1 -- We need to ignore the first row, because we have it twice
ORDER BY 
   rn ;

你得到的結果是:

|          d | group_nr |
|------------|----------|
| 2017-01-01 |        1 |
| 2017-01-01 |        1 |
| 2017-01-02 |        1 |
| 2017-01-03 |        1 |
| 2017-01-04 |        1 |
| 2017-01-17 |        2 |
| 2017-01-18 |        2 |
| 2017-01-18 |        2 |
| 2017-01-18 |        2 |
| 2017-01-19 |        2 |
| 2017-01-19 |        2 |
| 2017-01-26 |        3 |
| 2017-01-27 |        3 |
| 2017-01-27 |        3 |

我使用 1、2、3 作為組號(因為您可以擁有任意數量的組號,而單個字母則不是這樣)。從數字轉換為字母應該不難。

您可以在SQLFiddle檢查此查詢。它已經用 PostgreSQL 進行了測試。只需很少的修改(我想只是去掉這個RECURSIVE詞),它也可以與 MS SQL Server 和 Oracle 一起使用(至少)。

引用自:https://dba.stackexchange.com/questions/163819