Postgresql

形成具有相同值的連續行組

  • September 13, 2021

我有一種情況,我認為可以使用視窗函式解決,但我不確定。

想像一下下表

CREATE TABLE tmp (
 date timestamp
, id_type integer
) ;

INSERT INTO tmp (date, id_type)
VALUES
   ( '2017-01-10 07:19:21.0', 3 ),
   ( '2017-01-10 07:19:22.0', 3 ),
   ( '2017-01-10 07:19:23.1', 3 ),
   ( '2017-01-10 07:19:24.1', 3 ),
   ( '2017-01-10 07:19:25.0', 3 ),
   ( '2017-01-10 07:19:26.0', 5 ),
   ( '2017-01-10 07:19:27.1', 3 ),
   ( '2017-01-10 07:19:28.0', 5 ),
   ( '2017-01-10 07:19:29.0', 5 ),
   ( '2017-01-10 07:19:30.1', 3 ),
   ( '2017-01-10 07:19:31.0', 5 ),
   ( '2017-01-10 07:19:32.0', 3 ),
   ( '2017-01-10 07:19:33.1', 5 ),
   ( '2017-01-10 07:19:35.0', 5 ),
   ( '2017-01-10 07:19:36.1', 5 ),
   ( '2017-01-10 07:19:37.1', 5 );

我想在 column 中的每次值更改時都有一個新組id_type。EG 第一組 7:19:21 到 7:19:25,第二組 7:19:26 起止,以此類推。

此時,使用下面的查詢…

SELECT distinct 
   min(min(date)) over w as begin, 
   max(max(date)) over w as end,   
   id_type
FROM tmp
GROUP BY id_type
WINDOW w AS (PARTITION BY id_type)
ORDER BY begin;

我得到以下結果:

begin                   end                     id_type
2017-01-10 07:19:21.0   2017-01-10 07:19:32.0   3
2017-01-10 07:19:26.0   2017-01-10 07:19:37.1   5

雖然我想:

begin                   end                     id_type
2017-01-10 07:19:21.0   2017-01-10 07:19:25.0   3
2017-01-10 07:19:26.0   2017-01-10 07:19:26.0   5
2017-01-10 07:19:27.1   2017-01-10 07:19:27.1   3
2017-01-10 07:19:28.0   2017-01-10 07:19:29.0   5
2017-01-10 07:19:30.1   2017-01-10 07:19:30.1   3
2017-01-10 07:19:31.0   2017-01-10 07:19:31.0   5
2017-01-10 07:19:32.0   2017-01-10 07:19:32.0   3
2017-01-10 07:19:33.1   2017-01-10 07:19:37.1   5

一旦可行,我想包含更多標準來定義組,而其他這些標準將可以為空。

Postgres 版本:8.4。我們有 Postgres 和 PostGis,所以升級並不容易。PostGis 函式更改名稱,還有其他問題,但我們已經重寫了所有內容,新版本將使用更新的版本 9.X 和 PostGis 2.x。

對於幾點,

  • 不要呼叫tmp只會讓人感到困惑的非臨時表。
  • 不要使用文本作為時間戳(你在你的例子中這樣做我們可以看出,因為時間戳沒有被截斷並且有.0
  • 不要呼叫有時間的欄位date。如果它有日期和時間,它是一個時間戳(並將其儲存為一個)

最好使用視窗函式..

SELECT id_type, grp, min(date), max(date)
FROM (
 SELECT date, id_type, count(is_reset) OVER (ORDER BY date) AS grp
 FROM (
   SELECT date, id_type, CASE WHEN lag(id_type) OVER (ORDER BY date) <> id_type THEN 1 END AS is_reset
   FROM tmp
 ) AS t
) AS g
GROUP BY id_type, grp
ORDER BY min(date);

輸出

id_type | grp |          min          |          max          
---------+-----+-----------------------+-----------------------
      3 |   0 | 2017-01-10 07:19:21.0 | 2017-01-10 07:19:25.0
      5 |   1 | 2017-01-10 07:19:26.0 | 2017-01-10 07:19:26.0
      3 |   2 | 2017-01-10 07:19:27.1 | 2017-01-10 07:19:27.1
      5 |   3 | 2017-01-10 07:19:28.0 | 2017-01-10 07:19:29.0
      3 |   4 | 2017-01-10 07:19:30.1 | 2017-01-10 07:19:30.1
      5 |   5 | 2017-01-10 07:19:31.0 | 2017-01-10 07:19:31.0
      3 |   6 | 2017-01-10 07:19:32.0 | 2017-01-10 07:19:32.0
      5 |   7 | 2017-01-10 07:19:33.1 | 2017-01-10 07:19:37.1
(8 rows)

解釋

首先我們需要重置..我們生成它們lag()

SELECT date, id_type, CASE WHEN lag(id_type) OVER (ORDER BY date) <> id_type THEN 1 END AS is_reset
FROM tmp
ORDER BY date;

        date          | id_type | is_reset 
-----------------------+---------+----------
2017-01-10 07:19:21.0 |       3 |         
2017-01-10 07:19:22.0 |       3 |         
2017-01-10 07:19:23.1 |       3 |         
2017-01-10 07:19:24.1 |       3 |         
2017-01-10 07:19:25.0 |       3 |         
2017-01-10 07:19:26.0 |       5 |        1
2017-01-10 07:19:27.1 |       3 |        1
2017-01-10 07:19:28.0 |       5 |        1
2017-01-10 07:19:29.0 |       5 |         
2017-01-10 07:19:30.1 |       3 |        1
2017-01-10 07:19:31.0 |       5 |        1
2017-01-10 07:19:32.0 |       3 |        1
2017-01-10 07:19:33.1 |       5 |        1
2017-01-10 07:19:35.0 |       5 |         
2017-01-10 07:19:36.1 |       5 |         
2017-01-10 07:19:37.1 |       5 |         
(16 rows)

然後我們計算得到組。

SELECT date, id_type, count(is_reset) OVER (ORDER BY date) AS grp
FROM (
 SELECT date, id_type, CASE WHEN lag(id_type) OVER (ORDER BY date) <> id_type THEN 1 END AS is_reset
 FROM tmp
 ORDER BY date
) AS t
ORDER BY date

        date          | id_type | grp 
-----------------------+---------+-----
2017-01-10 07:19:21.0 |       3 |   0
2017-01-10 07:19:22.0 |       3 |   0
2017-01-10 07:19:23.1 |       3 |   0
2017-01-10 07:19:24.1 |       3 |   0
2017-01-10 07:19:25.0 |       3 |   0
2017-01-10 07:19:26.0 |       5 |   1
2017-01-10 07:19:27.1 |       3 |   2
2017-01-10 07:19:28.0 |       5 |   3
2017-01-10 07:19:29.0 |       5 |   3
2017-01-10 07:19:30.1 |       3 |   4
2017-01-10 07:19:31.0 |       5 |   5
2017-01-10 07:19:32.0 |       3 |   6
2017-01-10 07:19:33.1 |       5 |   7
2017-01-10 07:19:35.0 |       5 |   7
2017-01-10 07:19:36.1 |       5 |   7
2017-01-10 07:19:37.1 |       5 |   7
(16 rows)

然後我們包裝一個子選擇GROUP BYORDER選擇最小最大值(範圍)

SELECT id_type, grp, min(date), max(date)
FROM (
 .. stuff
) AS g
GROUP BY id_type, grp
ORDER BY min(date);

引用自:https://dba.stackexchange.com/questions/166374