Postgresql
使用開始和結束事件日誌 創建一個表/視圖,其中包含每個日誌時間之間的跨度
具體來說,我有一個事件表,用於記錄使用者加入或離開團隊的時間。它看起來像下面這樣:
------------------------------------- | user | event | team | timestamp | ------------------------------------- | A | joined | 1 | 2016-1-1 | | B | joined | 1 | 2016-1-1 | | C | left | 1 | 2016-1-1 | | C | joined | 2 | 2016-1-1 | | A | left | 1 | 2016-1-2 | | A | joined | 2 | 2016-1-2 | | B | left | 1 | 2016-1-3 | | A | left | 2 | 2016-1-3 | -------------------------------------
我需要對其進行重組以使其看起來如下所示
-------------------------------------- | user | team | joined | left | -------------------------------------- | A | 1 | 2016-1-1 | 2016-1-2 | | A | 2 | 2016-1-2 | 2016-1-3 | | B | 1 | 2016-1-1 | 2016-1-3 | | C | 1 | null | 2016-1-1 | | C | 2 | 2016-1-1 | null | --------------------------------------
我怎樣才能做到這一點?
有關更多詳細資訊,我正在嘗試在 Amazon Redshift (PostgreSQL) 中執行此操作
假設所有列
NOT NULL
。並且“離開”永遠不會早於相關的“加入”。簡單案例
如果使用者只能加入一次團隊(理想情況下,這將通過對 的
UNIQUE
約束來強制執行("user", team)
),那麼解決方案很簡單GROUP BY
,並且適用於 Redshift 以及大多數任何 RDBMS:SELECT "user", team , min(CASE WHEN event = 'joined' THEN timestamp END) AS joined , max(CASE WHEN event = 'left' THEN timestamp END) AS "left" FROM event GROUP BY "user", team ORDER BY "user", joined NULLS FIRST;
注意
NULLS FIRST
子句。好像你想先對一個開放的開始進行排序joined IS NULL
。Redshift 也支持這一點。除此之外,它是交叉表/數據透視查詢的最基本形式。
沒那麼簡單
從您的列名和範例數據來看,它可能不是那麼簡單。如果使用者可以多次加入團隊(不重疊),你必須做更多。您不希望像在此相關答案中那樣將多個團隊成員合併到一行中:
相反,您必須以某種方式配對相鄰的“加入”和“左”行。有很多方法…
Postgres 9.4+
對於現代 Postgres,我最喜歡這個:
SELECT "user", team , min(timestamp) FILTER (WHERE event = 'joined') AS joined , max(timestamp) FILTER (WHERE event = 'left' ) AS "left" FROM **( SELECT *, count(*) FILTER (WHERE event = 'joined') OVER (PARTITION BY "user", team ORDER BY timestamp) AS ct FROM event ) sub** GROUP BY "user", team, ct ORDER BY "user", joined NULLS FIRST;
FILTER
在視窗函式和聚合函式中使用聚合子句。相關(帶有替代方案的連結):這樣我們就可以計算同一個使用者加入同一個團隊的次數,這樣我們就可以對相鄰的行進行分組。也適用於
'joined'
開頭缺失或'left'
結尾缺失的情況。紅移
…不支持新
FILTER
條款。我們可以用一個普通的 old 代替CASE
:SELECT "user", team , min(CASE WHEN event = 'joined' THEN timestamp END) AS joined , max(CASE WHEN event = 'left' THEN timestamp END) AS "left" FROM ( SELECT *, count(CASE WHEN event = 'joined' THEN 1 END) OVER (PARTITION BY "user", team ORDER BY timestamp, event) AS ct FROM event ) sub GROUP BY "user", team, ct ORDER BY "user", joined NULLS FIRST;
**另外:**您不應該使用保留字作為標識符,即使 Redshift(或 Postgres)允許。
您可以使用條件聚合來獲取 ‘joined’ / ’left’ 值:
SELECT "user", team, MAX(CASE WHEN event = 'joined' THEN timestamp END) AS joined, MAX(CASE WHEN event = 'left' THEN timestamp END) AS left FROM mytable GROUP BY "user", team ORDER BY "user", team