Postgresql
查詢在給定長度分鐘的任何時間段內是否有超過 X 次出現
我花了很多時間思考我的問題的解決方案,但我放棄了。
讓我們想像一張桌子
user_id | occurred_at -- OK match example 1 | 2020-01-01 08:00:00 <- First match of the set 1 | 2020-01-01 08:08:00 <- Second match (8 minutes away from the previous so OK) 1 | 2020-01-01 08:10:30 <- this already exceeds 10 minutes period so the set is excluded -- Not matched example 1 | 2020-01-01 10:00:00 <- First match 1 | 2020-01-01 10:05:00 <- Second match (5 minutes away from the previous so OK) 1 | 2020-01-01 10:09:59 <- this fits into 10 minutes period so the set is matched (09:59 away altogether from 10:00:00) -- Another OK (4 matched) 2 | 2020-01-01 14:23:00 2 | 2020-01-01 14:24:00 2 | 2020-01-01 14:26:00 2 | 2020-01-01 14:27:00 -- Not matched 3 | 2020-01-01 11:00:00 3 | 2020-01-01 11:01:00 3 | 2020-01-01 15:26:00 3 | 2020-01-01 18:00:00 -- User mismatch so set is not matched neither 3 | 2020-01-01 20:00:00 1 | 2020-01-01 20:01:00 2 | 2020-01-01 20:02:00
如何查詢這樣的表以查找在顯式分鐘間隔(本例中=10 )內發生的給定使用者至少出現N(本例中=3 )的行?我認為上面的表格範例可以更好地解釋它。
使用視窗函式
lag()
,我們可以標記限定集結束的所有行:SELECT * , occurred_at - lag(occurred_at, 2) OVER (PARTITION BY user_id ORDER BY occurred_at) <= interval '10 min' AS passed FROM timestamps ORDER BY user_id, occurred_at;
occurred_at
如果同一使用者 ( ) 兩行的時間戳user_id
在 10 分鐘內,我們有一組三個。對於**給定
user_id
**的:SELECT count(*) FILTER (WHERE passed) AS qualifying_sets FROM ( SELECT occurred_at - lag(occurred_at, 2) OVER (ORDER BY occurred_at) <= interval '10 min' AS passed FROM timestamps WHERE user_id = 1 -- given user ) sub;
讓**所有
user_id
**的人至少通過一次測試:SELECT user_id, count(*) FILTER (WHERE passed) AS qualifying_sets FROM ( SELECT user_id , occurred_at - lag(occurred_at, 2) OVER (PARTITION BY user_id ORDER BY occurred_at) <= interval '10 min' AS passed FROM timestamps ) sub GROUP BY 1 HAVING bool_or(passed) ORDER BY 1;
添加的計數
qualifying_sets
是可選的。db<>在這裡擺弄