Postgresql
形成具有相同值的連續行組
我有一種情況,我認為可以使用視窗函式解決,但我不確定。
想像一下下表
CREATE TABLE tmp ( date timestamp , id_type integer ) ; INSERT INTO tmp (date, id_type) VALUES ( '2017-01-10 07:19:21.0', 3 ), ( '2017-01-10 07:19:22.0', 3 ), ( '2017-01-10 07:19:23.1', 3 ), ( '2017-01-10 07:19:24.1', 3 ), ( '2017-01-10 07:19:25.0', 3 ), ( '2017-01-10 07:19:26.0', 5 ), ( '2017-01-10 07:19:27.1', 3 ), ( '2017-01-10 07:19:28.0', 5 ), ( '2017-01-10 07:19:29.0', 5 ), ( '2017-01-10 07:19:30.1', 3 ), ( '2017-01-10 07:19:31.0', 5 ), ( '2017-01-10 07:19:32.0', 3 ), ( '2017-01-10 07:19:33.1', 5 ), ( '2017-01-10 07:19:35.0', 5 ), ( '2017-01-10 07:19:36.1', 5 ), ( '2017-01-10 07:19:37.1', 5 );
我想在 column 中的每次值更改時都有一個新組
id_type
。EG 第一組 7:19:21 到 7:19:25,第二組 7:19:26 起止,以此類推。此時,使用下面的查詢…
SELECT distinct min(min(date)) over w as begin, max(max(date)) over w as end, id_type FROM tmp GROUP BY id_type WINDOW w AS (PARTITION BY id_type) ORDER BY begin;
我得到以下結果:
begin end id_type 2017-01-10 07:19:21.0 2017-01-10 07:19:32.0 3 2017-01-10 07:19:26.0 2017-01-10 07:19:37.1 5
雖然我想:
begin end id_type 2017-01-10 07:19:21.0 2017-01-10 07:19:25.0 3 2017-01-10 07:19:26.0 2017-01-10 07:19:26.0 5 2017-01-10 07:19:27.1 2017-01-10 07:19:27.1 3 2017-01-10 07:19:28.0 2017-01-10 07:19:29.0 5 2017-01-10 07:19:30.1 2017-01-10 07:19:30.1 3 2017-01-10 07:19:31.0 2017-01-10 07:19:31.0 5 2017-01-10 07:19:32.0 2017-01-10 07:19:32.0 3 2017-01-10 07:19:33.1 2017-01-10 07:19:37.1 5
一旦可行,我想包含更多標準來定義組,而其他這些標準將可以為空。
Postgres 版本:8.4。我們有 Postgres 和 PostGis,所以升級並不容易。PostGis 函式更改名稱,還有其他問題,但我們已經重寫了所有內容,新版本將使用更新的版本 9.X 和 PostGis 2.x。
對於幾點,
- 不要呼叫
tmp
只會讓人感到困惑的非臨時表。- 不要使用文本作為時間戳(你在你的例子中這樣做我們可以看出,因為時間戳沒有被截斷並且有
.0
)- 不要呼叫有時間的欄位
date
。如果它有日期和時間,它是一個時間戳(並將其儲存為一個)最好使用視窗函式..
SELECT id_type, grp, min(date), max(date) FROM ( SELECT date, id_type, count(is_reset) OVER (ORDER BY date) AS grp FROM ( SELECT date, id_type, CASE WHEN lag(id_type) OVER (ORDER BY date) <> id_type THEN 1 END AS is_reset FROM tmp ) AS t ) AS g GROUP BY id_type, grp ORDER BY min(date);
輸出
id_type | grp | min | max ---------+-----+-----------------------+----------------------- 3 | 0 | 2017-01-10 07:19:21.0 | 2017-01-10 07:19:25.0 5 | 1 | 2017-01-10 07:19:26.0 | 2017-01-10 07:19:26.0 3 | 2 | 2017-01-10 07:19:27.1 | 2017-01-10 07:19:27.1 5 | 3 | 2017-01-10 07:19:28.0 | 2017-01-10 07:19:29.0 3 | 4 | 2017-01-10 07:19:30.1 | 2017-01-10 07:19:30.1 5 | 5 | 2017-01-10 07:19:31.0 | 2017-01-10 07:19:31.0 3 | 6 | 2017-01-10 07:19:32.0 | 2017-01-10 07:19:32.0 5 | 7 | 2017-01-10 07:19:33.1 | 2017-01-10 07:19:37.1 (8 rows)
解釋
首先我們需要重置..我們生成它們
lag()
SELECT date, id_type, CASE WHEN lag(id_type) OVER (ORDER BY date) <> id_type THEN 1 END AS is_reset FROM tmp ORDER BY date; date | id_type | is_reset -----------------------+---------+---------- 2017-01-10 07:19:21.0 | 3 | 2017-01-10 07:19:22.0 | 3 | 2017-01-10 07:19:23.1 | 3 | 2017-01-10 07:19:24.1 | 3 | 2017-01-10 07:19:25.0 | 3 | 2017-01-10 07:19:26.0 | 5 | 1 2017-01-10 07:19:27.1 | 3 | 1 2017-01-10 07:19:28.0 | 5 | 1 2017-01-10 07:19:29.0 | 5 | 2017-01-10 07:19:30.1 | 3 | 1 2017-01-10 07:19:31.0 | 5 | 1 2017-01-10 07:19:32.0 | 3 | 1 2017-01-10 07:19:33.1 | 5 | 1 2017-01-10 07:19:35.0 | 5 | 2017-01-10 07:19:36.1 | 5 | 2017-01-10 07:19:37.1 | 5 | (16 rows)
然後我們計算得到組。
SELECT date, id_type, count(is_reset) OVER (ORDER BY date) AS grp FROM ( SELECT date, id_type, CASE WHEN lag(id_type) OVER (ORDER BY date) <> id_type THEN 1 END AS is_reset FROM tmp ORDER BY date ) AS t ORDER BY date date | id_type | grp -----------------------+---------+----- 2017-01-10 07:19:21.0 | 3 | 0 2017-01-10 07:19:22.0 | 3 | 0 2017-01-10 07:19:23.1 | 3 | 0 2017-01-10 07:19:24.1 | 3 | 0 2017-01-10 07:19:25.0 | 3 | 0 2017-01-10 07:19:26.0 | 5 | 1 2017-01-10 07:19:27.1 | 3 | 2 2017-01-10 07:19:28.0 | 5 | 3 2017-01-10 07:19:29.0 | 5 | 3 2017-01-10 07:19:30.1 | 3 | 4 2017-01-10 07:19:31.0 | 5 | 5 2017-01-10 07:19:32.0 | 3 | 6 2017-01-10 07:19:33.1 | 5 | 7 2017-01-10 07:19:35.0 | 5 | 7 2017-01-10 07:19:36.1 | 5 | 7 2017-01-10 07:19:37.1 | 5 | 7 (16 rows)
然後我們包裝一個子選擇
GROUP BY
並ORDER
選擇最小最大值(範圍)SELECT id_type, grp, min(date), max(date) FROM ( .. stuff ) AS g GROUP BY id_type, grp ORDER BY min(date);