能夠根據事件查詢計算天數的 SQL 查詢
我有一個包含以下數據的表:
id date_changed color_start color_end ------------------------------------------------------------------- 1 2020-05-27 16:33:52.000 green yellow 1 2020-06-11 20:12:18.000 yellow red 1 2020-06-11 20:20:58.000 red green 2 2021-03-03 14:31:44.000 yellow red 2 2020-08-06 14:59:21.000 green yellow 3 2021-04-28 12:36:45.000 green red ...
例如,id #2 的項目在 2020-08-06 14:59:21 從綠色變為黃色,然後在 2021-03-03 14:31:44 從黃色變為讀取。我需要計算兩個時間範圍之間有多少項目處於綠色、黃色、紅色狀態。
我通過做一些研究來嘗試以下查詢,基本上列出過去一年的事件或每天,但這並不是我真正想要的。
SELECT d.date, items.id, count(CASE WHEN items.color_end = 'yellow' THEN 1 ELSE null END) as yellow_count, count(CASE WHEN items.color_end = 'green' THEN 1 ELSE null END) as green_count, count(CASE WHEN items.color_end = 'red' THEN 1 ELSE null END) as red_count, count(CASE WHEN items.color_end = 'yellow' THEN 1 ELSE null end) + count(CASE WHEN items.color_end = 'green' THEN 1 ELSE null END) + count(CASE WHEN items.color_end = 'red' THEN 1 ELSE null END) as total_count FROM (SELECT to_char(date_trunc('day', (current_date - offs)), 'YYYY-MM-DD') AS date FROM generate_series(0, 365, 1) AS offs ) d LEFT OUTER JOIN events items ON d.date = to_char(date_trunc('day', item.date_changed), 'YYYY-MM-DD') GROUP BY d.date, items.id;
您的問題是您的數據模型存在缺陷。
您有 6 條記錄:
id date_changed color_start color_end ------------------------------------------------------------------- 1 2020-05-27 16:33:52.000 green yellow 1 2020-06-11 20:12:18.000 yellow red 1 2020-06-11 20:20:58.000 red green 2 2021-03-03 14:31:44.000 yellow red 2 2020-08-06 14:59:21.000 green yellow 3 2021-04-28 12:36:45.000 green red
您應該有 9 條記錄。因此,您的記錄結構應該是這樣的(
id
僅適用於 = 1 的範例):id date_from date_to status 1 -infinity, 2020-05-27 16:33:52+01 green 1 2020-05-27 16:33:52+01, 2020-06-11 20:12:18+01 yellow 1 2020-06-11 20:12:18+01, 2020-06-11 20:20:58+01 red 1 2020-06-11 20:20:58+01 +infinity green
您試圖在兩個不同欄位的一條記錄中記錄兩種狀態。因此,有兩種可能的重構解決方案:
- 修改您的架構(請參閱此處的小提琴)。即使您不(或不能)更改基礎表,您也可以使用公用表表達式 (
CTE
) 重構目前數據,然後對其進行查詢。永久重構將是非常可取的!TSTZRANGE
使用數據類型修改架構(參見 fiddle here)。您可能還想考慮我所說的“輔助”解決方案——即滾動你自己的觸發器或可能
Temporal Tables
——目前僅作為(非貢獻)PostgreSQL 擴展提供。如果我對 PostgreSQL 有任何批評,其中之一就是他們還沒有(還)本機實現這個!我們從表/數據開始:
-- -- Original table design from the OP -- CREATE TABLE test ( id INT NOT NULL, dc TIMESTAMPTZ(0) NOT NULL, -- all times are xx:yy:zz.000, so use a precision of 0 cs TEXT NOT NULL, ce TEXT NOT NULL, UNIQUE (id, dc), CHECK (cs != ce) );
記錄:
-- -- OP's data -- INSERT INTO test VALUES (1, '2020-05-27 16:33:52', 'green', 'yellow'), (1, '2020-06-11 20:12:18', 'yellow', 'red'), (1, '2020-06-11 20:20:58', 'red', 'green'), (2, '2020-08-06 14:59:21', 'green', 'yellow'), (2, '2021-03-03 14:31:44', 'yellow', 'red'), (3, '2021-04-28 12:36:45', 'green', 'red');
第一個解決方案 - 每條記錄兩個
TIMESTAMPTZ
欄位(見小提琴):我們從使用
LAG()
和LEAD()
PostgreSQL視窗函式開始。視窗函式非常強大,非常值得學習。他們將回報多次研究它們所付出的任何努力。-- -- "Foundation" query - using this as a Common Table Expression, we can refactor our -- data - or better yet, refactor our base tables in our system. -- SELECT id, COALESCE(LAG(dc) OVER (PARTITION BY id ORDER BY dc), '-INFINITY') AS lag_dc, cs, dc, COALESCE(LEAD(dc) OVER (PARTITION BY id ORDER BY dc), 'INFINITY') AS lead_dc, ce FROM test;
結果(在小提琴上更好地查看):
id lag_dc cs dc lead_dc ce 1 -infinity green 2020-05-27 16:33:52+01 2020-06-11 20:12:18+01 yellow 1 2020-05-27 16:33:52+01 yellow 2020-06-11 20:12:18+01 2020-06-11 20:20:58+01 red 1 2020-06-11 20:12:18+01 red 2020-06-11 20:20:58+01 infinity green 2 -infinity green 2020-08-06 14:59:21+01 2021-03-03 14:31:44+00 yellow 2 2020-08-06 14:59:21+01 yellow 2021-03-03 14:31:44+00 infinity red 3 -infinity green 2021-04-28 12:36:45+01 infinity red
我不會重現小提琴上的每個查詢 - 這是第一個:
-- -- We get the first record of each set (by id) - from '-INFINITY' to the first -- date_changed (dc) -- WITH cte1 AS ( SELECT id, COALESCE(LAG(dc) OVER (PARTITION BY id ORDER BY dc), '-INFINITY') AS lag_dc, cs, dc, COALESCE(LEAD(dc) OVER (PARTITION BY id ORDER BY dc), 'INFINITY') AS lead_dc, ce FROM test ) SELECT c1.id, c1.lag_dc AS df, c1.dc AS dt, c1.cs FROM cte1 c1 WHERE lag_dc = '-INFINITY';
結果:
id df dt cs 1 -infinity 2020-05-27 16:33:52+01 green 2 -infinity 2020-08-06 14:59:21+01 green 3 -infinity 2021-04-28 12:36:45+01 green
然後我們
UNION
三個這樣的查詢:
- 每組的第一條記錄
id
,- 每組的中間記錄,
- 的最後記錄
id
。如下:
-- -- We now obtain the union of all 3 sets and we have our result! -- WITH cte AS ( SELECT id, COALESCE(LAG(dc) OVER (PARTITION BY id ORDER BY dc), '-INFINITY') AS lag_dc, cs, dc, COALESCE(LEAD(dc) OVER (PARTITION BY id ORDER BY dc), 'INFINITY') AS lead_dc, ce FROM test ) SELECT c1.id, c1.lag_dc AS "Date from:", c1.dc AS "Date to:", c1.cs AS "Colour" FROM cte c1 WHERE lag_dc = '-INFINITY' -- first records UNION ALL SELECT c2.id, c2.dc, c2.lead_dc, c2.ce FROM cte c2 WHERE lead_dc = 'INFINITY' -- last records UNION ALL SELECT c3.id, c3.dc, c3.lead_dc, c3.ce FROM cte c3 WHERE c3.lead_dc != 'INFINITY' -- middle records ORDER BY 1, 2;
結果:
id Date from: Date to: Colour 1 -infinity 2020-05-27 16:33:52+01 green 1 2020-05-27 16:33:52+01 2020-06-11 20:12:18+01 yellow 1 2020-06-11 20:12:18+01 2020-06-11 20:20:58+01 red 1 2020-06-11 20:20:58+01 infinity green 2 -infinity 2020-08-06 14:59:21+01 green 2 2020-08-06 14:59:21+01 2021-03-03 14:31:44+00 yellow 2 2021-03-03 14:31:44+00 infinity red 3 -infinity 2021-04-28 12:36:45+01 green 3 2021-04-28 12:36:45+01 infinity red
這使得查詢數據相對簡單 - 兩個範例查詢。
colour
第一個範例查詢: 2021 年的狀態變化( ):-- -- Records where the beginning and the end of the range falls -- anywhere >= 2021:01:01 00:00:00 -- -- It's bascially a record of any changes in status in 2021! -- SELECT * FROM test_rs WHERE ts_from >= '2021-01-01 00:00:00' ANd ts_to >= '2021-01-01 00:00:00' ORDER BY id, ts_from;
結果:
id ts_from ts_to colour 2 2021-03-03 14:31:44+00 infinity red 3 2021-04-28 12:36:45+01 infinity red
第二個範例查詢:某個時間點的狀態計數(2021 年元旦):
-- -- Status counts at exactly New Year, 2021 - we know that at the point, we had two -- entities with status green and 1 with status yellow -- SELECT colour, COUNT(colour) FROM test_rs WHERE ts_from <= '2021-01-01 00:00:00' AND ts_to >= '2021-01-01 00:00:00' GROUP BY colour ORDER BY colour;
結果:
colour count green 2 yellow 1
TSTZRANGE
第二種解決方案 -每條記錄一個欄位(見小提琴):從手冊中,我們有包容性和排斥性的界限和無限的(或無界的——也許是一個更好的術語)界限——即
[
或是]
包容性界限,(
並且)
是排斥性界限。此外,[,
或對於右括號 ( or )(,
來說是無界的上界和無界的。,]``,)
重構時間戳範圍查詢:
我不會通過所有查詢來獲得最終結果——這與上面使用的過程類似。僅顯示最終查詢:
WITH cte AS ( -- -- We don't need COALESCE in this case, since the range treats 'NULL' as -INFINITY -- or +INFIITY depending on whether it's at the beginning or end of the range. -- SELECT id, LAG(dc) OVER (PARTITION BY id ORDER BY dc) AS lag_dc, cs, dc, LEAD(dc) OVER (PARTITION BY id ORDER BY dc) AS lead_dc, ce FROM test ) SELECT c.id, TSTZRANGE(c.lag_dc, c.dc, '[)') AS "Date from:/Date to:", c.cs AS "Colour" FROM cte c WHERE c.lag_dc IS NULL UNION ALL SELECT c.id, TSTZRANGE(c.dc, c.lead_dc, '[)'), c.ce FROM cte c WHERE lead_dc IS NULL UNION ALL SELECT c.id, TSTZRANGE(c.dc, c.lead_dc, '[)'), c.ce FROM cte c WHERE c.lead_dc != 'INFINITY' ORDER BY 1, 2;
結果:
id Date from:/Date to: Colour 1 (,"2020-05-27 16:33:52+01") green 1 ["2020-05-27 16:33:52+01","2020-06-11 20:12:18+01") yellow 1 ["2020-06-11 20:12:18+01","2020-06-11 20:20:58+01") red 1 ["2020-06-11 20:20:58+01",) green 2 (,"2020-08-06 14:59:21+01") green 2 ["2020-08-06 14:59:21+01","2021-03-03 14:31:44+00") yellow 2 ["2021-03-03 14:31:44+00",) red 3 (,"2021-04-28 12:36:45+01") green 3 ["2021-04-28 12:36:45+01",) red
第一個範例查詢: 2021 年初之後的任何時間狀態變化。
-- -- Records where the beginning and the end of the range falls -- anywhere >= 2021:01:01 00:00:00 -- -- It's bascially a record of any changes in status in 2021! -- SELECT * FROM test_rs WHERE LOWER(df_dt) > '2021-01-01 00:00:00' AND df_dt && TSTZRANGE('2021-01-01 00:00:00'::TIMESTAMPTZ, NULL, '[)');
結果:
id df_dt colour 2 ["2021-03-03 14:31:44+00",) red 3 ["2021-04-28 12:36:45+01",) red
我們可以看到這等效於第一個範例查詢,其中 range 欄位代替了兩個時間戳。
該
WHERE
子句值得一看:LOWER(df_dt)
並且WHERE trb && TSTZRANGE('2021-01-01 00:00:00'::TIMESTAMPTZ, NULL, '[)');
- 它使用了&&
(overlaps) 運算符(請參閱手冊- 我也發現這篇文章很有幫助)。因此,我們可以看到任何與從
2021-01-01 00:00:00
( 開始的任何時間重疊的時間範圍...MPTZ, NULL, '[)');
都被拾取 - 請注意NULL
構造TSTZRANGE
值的過程。第二次查詢:2021 年初的顏色計數。
-- -- Status counts at exactly New Year, 2021 - we know that at the point, we had two -- entities with status green and 1 with status yellow -- SELECT colour, COUNT(colour) FROM test_rs WHERE df_dt && TSTZRANGE('2021-01-01 00:00:00', '2021-01-01 00:00:00', '[]') GROUP BY colour;
結果(與具有兩個時間戳的第二個範例查詢相同):
colour count green 2 yellow 1
輔助解決方案:
還有其他方法可以重構您的架構(和/或應用程式碼)以跟踪數據庫中的更改,特別是對於引用時間戳欄位的表 - 對這些的完整討論超出了此答案的範圍,但您可能需要考慮以下內容:
- 使用觸發器自己滾動或
- 安裝 Vlad Arkhipov 的
Temporal Table
解決方案。我在 POC 中使用過它,它似乎工作得很好,但沒有任何功能,比如MariaDB 的解決方案。這涉及編譯和安裝C
基於擴展的擴展。- 安裝 Vik Fearing 的(一個 PostgreSQL 主要貢獻者)時態表擴展(也在
C
)。我沒有使用它,但這個人是主要貢獻者的事實不言而喻。截至撰寫本文時,似乎是最新的!- Near Form 的
temporal_tables
功能,根據 github 連結是:a temporal_tables extension in PL/pgSQL, without the need for external c extension. The goal is to be able to use it on AWS RDS and other hosted solutions, where using custom extensions or c functions is not an option
. 聽起來很有趣,但我沒有嘗試過,所以無法發表評論。