Gaps-and-Islands
如何在時間跨度內對數據進行分組
我有一組按日期排序的行,如下所示:
2017-01-01 2017-01-01 2017-01-02 2017-01-03 2017-01-04 2017-01-17 2017-01-18 2017-01-18 2017-01-18 2017-01-19 2017-01-19 2017-01-26 2017-01-27 2017-01-27
為願意提供幫助的人提供 DDL 和 DML:
CREATE TABLE foo ( DateCol date ); INSERT INTO foo ( DateCol ) VALUES ( '2017-01-01' ), ( '2017-01-01' ), ( '2017-01-02' ), ( '2017-01-03' ), ( '2017-01-04' ), ( '2017-01-17' ), ( '2017-01-18' ), ( '2017-01-18' ), ( '2017-01-18' ), ( '2017-01-19' ), ( '2017-01-19' ), ( '2017-01-26' ), ( '2017-01-27' ), ( '2017-01-27' ) );
我正在嘗試對行進行分組並添加一個額外的列來標記該組。標準是,如果它們之間的時間間隔小於 3 天,則任何兩行都應該在一組中,如下所示:
2017-01-01 A 2017-01-01 A 2017-01-02 A 2017-01-03 A 2017-01-04 A 2017-01-17 B 2017-01-18 B 2017-01-18 B 2017-01-18 B 2017-01-19 B 2017-01-19 B 2017-01-26 C 2017-01-27 C 2017-01-27 C
如何做到這一點?
窗函式法
SELECT date, chr( 65 + CAST(count(reset) OVER (ORDER BY date) AS int) ) AS dategrp FROM ( SELECT date, CASE WHEN date - lag(date) OVER () > 3 THEN 1 END AS reset FROM foo ORDER BY date ) AS t
首先,我們使用視窗函式生成重置
SELECT date, CASE WHEN date - lag(date) OVER () > 3 THEN 1 END AS reset FROM foo ORDER BY date date | reset ------------+------- 2017-01-01 | 2017-01-01 | 2017-01-02 | 2017-01-03 | 2017-01-04 | 2017-01-17 | 1 2017-01-18 | 2017-01-18 | 2017-01-18 | 2017-01-19 | 2017-01-19 | 2017-01-26 | 1 2017-01-27 | 2017-01-27 |
然後我們生成數字組
SELECT date, count(reset) OVER (ORDER BY date) FROM ( SELECT date, CASE WHEN date - lag(date) OVER () > 3 THEN 1 END AS reset FROM foo ORDER BY date ) AS t date | count ------------+------- 2017-01-01 | 0 2017-01-01 | 0 2017-01-02 | 0 2017-01-03 | 0 2017-01-04 | 0 2017-01-17 | 1 2017-01-18 | 1 2017-01-18 | 1 2017-01-18 | 1 2017-01-19 | 1 2017-01-19 | 1 2017-01-26 | 2 2017-01-27 | 2 2017-01-27 | 2
然後我們
chr(65+)
用來獲取您想要的 alpha 組名稱。原始查詢在頂部..date | dategrp ------------+--------- 2017-01-01 | A 2017-01-01 | A 2017-01-02 | A 2017-01-03 | A 2017-01-04 | A 2017-01-17 | B 2017-01-18 | B 2017-01-18 | B 2017-01-18 | B 2017-01-19 | B 2017-01-19 | B 2017-01-26 | C 2017-01-27 | C 2017-01-27 | C (14 rows)
這可以通過RECURSIVE Common Table Expression和一些帶有滯後
window function
的預處理來完成。首先,假設這是我們的數據:
CREATE TABLE ds ( d date NOT NULL ) ; INSERT INTO ds(d) VALUES ('2017-01-01'), ('2017-01-01'), ('2017-01-02'), ('2017-01-03'), ('2017-01-04'), ('2017-01-17'), ('2017-01-18'), ('2017-01-18'), ('2017-01-18'), ('2017-01-19'), ('2017-01-19'), ('2017-01-26'), ('2017-01-27'), ('2017-01-27') ;
有了這些數據,我們可以使用
lag
視窗函式“查看上一行”,並判斷距離是否小於 3 天。[在你的規格中。您談論下一個日期,但您的範例按**前一個日期分組]。SELECT d, coalesce(d - lag(d) over(order by d) >= 3, true) AS start_new_group, row_number() over (order by d) AS rn FROM ds ORDER BY d ;
這是我們在這一點上得到的(中間)結果:
| d | start_new_group | rn | |------------|-----------------|----| | 2017-01-01 | true | 1 | | 2017-01-01 | false | 2 | | 2017-01-02 | false | 3 | | 2017-01-03 | false | 4 | | 2017-01-04 | false | 5 | | 2017-01-17 | true | 6 | | 2017-01-18 | false | 7 | | 2017-01-18 | false | 8 | | 2017-01-18 | false | 9 | | 2017-01-19 | false | 10 | | 2017-01-19 | false | 11 | | 2017-01-26 | true | 12 | | 2017-01-27 | false | 13 | | 2017-01-27 | false | 14 |
現在我們知道何時開始下一組日期,我們可以使用 a
RECURSIVE CTE
,它還包括上一個查詢(我們稱之為d_start
)WITH RECURSIVE d_start AS (SELECT d, -- Here is where we decide when to start a new group coalesce(d - lag(d) over(order by d) >= 3, true) AS start_new_group, row_number() over (order by d) AS rn -- So that we can order by and join FROM ds ), d_group AS ( SELECT d, rn, 0 AS group_nr FROM d_start WHERE rn = 1 UNION ALL SELECT d_start.d, rn+1 AS rn, CASE WHEN d_start.start_new_group THEN group_nr+1 ELSE group_nr END AS group_nr FROM d_group -- We recurse at this point JOIN d_start USING(rn) -- and join the info of d_start ) SELECT d, group_nr FROM d_group WHERE rn > 1 -- We need to ignore the first row, because we have it twice ORDER BY rn ;
你得到的結果是:
| d | group_nr | |------------|----------| | 2017-01-01 | 1 | | 2017-01-01 | 1 | | 2017-01-02 | 1 | | 2017-01-03 | 1 | | 2017-01-04 | 1 | | 2017-01-17 | 2 | | 2017-01-18 | 2 | | 2017-01-18 | 2 | | 2017-01-18 | 2 | | 2017-01-19 | 2 | | 2017-01-19 | 2 | | 2017-01-26 | 3 | | 2017-01-27 | 3 | | 2017-01-27 | 3 |
我使用 1、2、3 作為組號(因為您可以擁有任意數量的組號,而單個字母則不是這樣)。從數字轉換為字母應該不難。
您可以在SQLFiddle檢查此查詢。它已經用 PostgreSQL 進行了測試。只需很少的修改(我想只是去掉這個
RECURSIVE
詞),它也可以與 MS SQL Server 和 Oracle 一起使用(至少)。