事件數組對的聚合查詢
db<>fiddle用於下面的所有數據和查詢
我有一個
events
具有以下結構的表:create table events ( correlation_id char(26) not null, user_id bigint, task_id bigint not null, location_id bigint, type bigint not null, created_at timestamp(6) with time zone not null, constraint events_correlation_id_created_at_user_id_unique unique (correlation_id, created_at, user_id) );
此表包含正在執行的任務的記錄,如下所示:
帶有 的行
type = 0
表示任務的開始,帶有 的行type = 99
表示任務的結束。(其他值表示與此問題無關的其他內容,但為了完整起見,此處包含兩個範例行。)每個
task_id
對應於tasks
表中的一行。任務表中唯一與該問題相關的其他欄位稱為inprogress_status
,它可以是1
or2
,分別表示Opening task
和Closing task
。我最初被要求提供一個查詢,該查詢將返回按開始日期和位置排序的任務列表,其中一行包含每個任務的開始 (
type = 0
) 和結束 (type = 99
)。這是我曾經這樣做的查詢:
SELECT e.created_at::DATE, e.location_id, e.task_id , CASE t.inprogress_status WHEN 2 THEN 'CLOSE' WHEN 1 THEN 'OPEN' END AS task_type , e.correlation_id , json_object_agg(e.type, json_build_object('timestamp', e.created_at, 'user_id', e.user_id)) AS events FROM events e JOIN tasks t on e.task_id = t.id WHERE e.type IN (0, 99) AND t.inprogress_status IN (1, 2) group by created_at::DATE, location_id, task_id, correlation_id, inprogress_status ORDER BY 1, 2, 3;
這是使用上面顯示的數據進行查詢的結果:
created_at location_id task_id 任務類型 相關標識 事件 2018-08-17 30 58 打開 01CN4HP4AN0000000000000001 {"0": {"timestamp": "2018-08-17T18:17:15.348629+00:00", "user_id": 4}, "99": {"timestamp": "2018-08-17T18:17:25.535593+00:00", "user_id": 4} }
2018-08-17 30 97 關 01CN4J9SZ80000000000000003 {"0": {"timestamp": "2018-08-17T18:28:00.104093+00:00", "user_id": 4}, "99": {"timestamp": "2018-08-17T18:29:09.01684+00:00", "user_id": 4} }
2018-08-17 30 99 打開 01CN4JC1430000000000000004 { "0": {"timestamp": "2018-08-17T18:29:12.963264+00:00", "user_id": 4}, "99": {"timestamp": "2018-08-17T18:32:09.272632+00:00", "user_id": 4} }
2018-08-17 30 97 關 01CN4KJCDY0000000000000005 { "0": {"timestamp": "2018-08-17T18:50:09.725668+00:00", "user_id": 139}, "99": {"timestamp": "2018-08-17T18:51:42.240895+00:00", "user_id": 139} }
2018-08-20 30 99 打開 01CNC4G1Y40000000000000008 { "0": {"timestamp": "2018-08-20T17:00:40.26043+00:00", "user_id": 139}, "99" : {"timestamp": "2018-08-20T17:00:47.583501+00:00", "user_id" : 139} }
在上面的例子中,
task_id
58
and99
haveinprogress_status = 1
和task_id
97
hasinprogress_status = 2
。現在我被要求修改返回的資料結構,以便它也可以聚合
inprogress_status
,並將行作為 OPEN+CLOSE 事件對返回。為了弄清楚如何建構它,我首先嘗試獲取這種格式(我真正想要的最終格式如下):
created_at location_id 事件 2018-08-17 30 {"OPEN": [{"correlation_id": "01CN4HP4AN0000000000000001", "0" : {"timestamp" : "2018-08-17T18:17:15.348629+00:00", "user_id" : 4}, "99" : {"timestamp" : "2018-08-17T18:17:25.535593+00:00", "user_id" : 4} }, {"OPEN": {"correlation_id": "01CN4JC1430000000000000004", "0" : {"timestamp" : "2018-08-17T18:29:12.963264+00:00", "user_id" : 4}, "99" : {"timestamp" : "2018-08-17T18:32:09.272632+00:00", "user_id" : 4} }], "CLOSE": [{"correlation_id": "01CN4J9SZ80000000000000003", "0" : {"timestamp" : "2018-08-17T18:28:00.104093+00:00", "user_id" : 4}, "99" : {"timestamp" : "2018-08-17T18:29:09.01684+00:00", "user_id" : 4} }, { "correlation_id": "01CN4KJCDY0000000000000005", "0" : {"timestamp" : "2018-08-17T18:50:09.725668+00:00", "user_id" : 139}, "99" : {"timestamp" : "2018-08-17T18:51:42.240895+00:00", "user_id" : 139} }]}
2018-08-20 30 {"OPEN": [{"correlation_id": "01CNC4G1Y40000000000000008", "0" : {"timestamp" : "2018-08-20T17:00:40.26043+00:00", "user_id" : 139}, "99" : {"timestamp" : "2018-08-20T17:00:47.583501+00:00", "user_id" : 139} }], "CLOSE": null}
這是我寫的第一個查詢,試圖使這項工作:
WITH grouped_events AS ( SELECT e.created_at::DATE AS created_date, location_id, task_id, CASE t.inprogress_status WHEN 2 THEN 'CLOSE' WHEN 1 THEN 'OPEN' END AS task_type, jsonb_build_object('id', e.correlation_id) || jsonb_object_agg(type, jsonb_build_object('timestamp', e.created_at, 'user_id', user_id)) AS events FROM events e JOIN tasks t on e.task_id = t.id WHERE type IN (0, 99) AND inprogress_status IN (1, 2) GROUP BY e.created_at::DATE, location_id, task_id, correlation_id, t.inprogress_status ) SELECT created_date, location_id, json_object_agg(task_type, events) FROM grouped_events GROUP BY 1, 2 ORDER BY 1, 2
問題是這會產生無效的 JSON。具有多個相同的鍵:
{ "OPEN": { "0": { "user_id": 4, "timestamp": "2018-08-17T18:29:12.963264+00:00" }, "99": { "user_id": 4, "timestamp": "2018-08-17T18:32:09.272632+00:00" }, "id": "01CN4JC1430000000000000004" }, "OPEN": { "0": { "user_id": 4, "timestamp": "2018-08-17T18:17:15.348629+00:00" }, "99": { "user_id": 4, "timestamp": "2018-08-17T18:17:25.535593+00:00" }, "id": "01CN4HP4AN0000000000000001" }, // ... etc. }
我發現這個查詢以上面顯示的格式返回數據:
WITH grouped_events1 AS ( SELECT e.created_at::DATE AS created_date, location_id, task_id, CASE t.inprogress_status WHEN 2 THEN 'CLOSE' WHEN 1 THEN 'OPEN' END AS task_type, jsonb_build_object('id', e.correlation_id) || jsonb_object_agg(type, jsonb_build_object('timestamp', e.created_at, 'user_id', user_id)) AS events FROM events e JOIN tasks t on e.task_id = t.id WHERE type IN (0, 99) AND inprogress_status IN (1, 2) GROUP BY e.created_at::DATE, location_id, task_id, correlation_id, t.inprogress_status ), grouped_events2 AS ( SELECT created_date, location_id, task_type, json_agg(events) AS events FROM grouped_events1 GROUP BY 1, 2, 3 ) SELECT created_date, location_id, json_object_agg(task_type, events) FROM grouped_events2 GROUP BY 1, 2 ORDER BY 1, 2
但是,我實際需要的格式應該只是將單個 OPEN 與單個 CLOSE 配對,如下所示(每個 OPEN 和緊隨其後的 CLOSE):
created_at location_id 事件 2018-08-17 30 {"OPEN": {"correlation_id": "01CN4HP4AN0000000000000001", "0" : {"timestamp" : "2018-08-17T18:17:15.348629+00:00", "user_id" : 4}, "99" : {"timestamp" : "2018-08-17T18:17:25.535593+00:00", "user_id" : 4} }, "CLOSE": {"correlation_id": "01CN4J9SZ80000000000000003", "0" : {"timestamp" : "2018-08-17T18:28:00.104093+00:00", "user_id" : 4}, "99" : {"timestamp" : "2018-08-17T18:29:09.01684+00:00", "user_id" : 4} }}
2018-08-17 30 {"OPEN": {"OPEN": {"correlation_id": "01CN4JC1430000000000000004", "0" : {"timestamp" : "2018-08-17T18:29:12.963264+00:00", "user_id" : 4}, "99" : {"timestamp" : "2018-08-17T18:32:09.272632+00:00", "user_id" : 4} }, "CLOSE": { "correlation_id": "01CN4KJCDY0000000000000005", "0" : {"timestamp" : "2018-08-17T18:50:09.725668+00:00", "user_id" : 139}, "99" : {"timestamp" : "2018-08-17T18:51:42.240895+00:00", "user_id" : 139} }}
2018-08-20 30 {"OPEN": [{"correlation_id": "01CNC4G1Y40000000000000008", "0" : {"timestamp" : "2018-08-20T17:00:40.26043+00:00", "user_id" : 139}, "99" : {"timestamp" : "2018-08-20T17:00:47.583501+00:00", "user_id" : 139} }], "CLOSE": null}
現在我想弄清楚我是否走錯了方向,因為我看不出如何從我所擁有的東西中得到我的最終格式。
我接近這個錯誤嗎?我怎樣才能得到我正在尋找的結果?
這會產生您想要的結果:
SELECT the_day, location_id , jsonb_object_agg(task_type, events || jsonb_build_object('correlation_id', correlation_id)) AS events FROM ( SELECT e.created_at::date AS the_day, e.location_id, e.correlation_id , count(*) FILTER (WHERE t.inprogress_status = 1) OVER (PARTITION BY e.location_id ORDER BY min(e.created_at) FILTER (WHERE e.type = 0)) AS task_nr , CASE t.inprogress_status WHEN 2 THEN 'CLOSE' WHEN 1 THEN 'OPEN' END AS task_type , jsonb_object_agg(e.type, jsonb_build_object('timestamp', e.created_at, 'user_id', e.user_id)) AS events FROM events e JOIN tasks t on e.task_id = t.id WHERE e.type IN (0, 99) AND t.inprogress_status IN (1, 2) GROUP BY 1, 2, e.correlation_id, t.inprogress_status ) sub GROUP BY the_day, location_id, task_nr ORDER BY the_day, location_id, task_nr;
db<>在這裡擺弄
除了在一天開始時缺少“OPEN”事件和在最後缺少“CLOSE”事件之外只是缺少。
我使用
jsonb
而不是json
允許jsonb || jsonb
操作員。json
如果您確實需要,您可以將結果轉換為。核心特徵是形成任務編號的複雜表達式:
, count(*) FILTER (WHERE t.inprogress_status = 1) OVER (PARTITION BY e.location_id ORDER BY min(e.created_at) FILTER (WHERE e.type = 0)) AS task_nr
每個“打開”任務都會啟動一個新組。
created_at
with定義了任務的type = 0
順序。從技術上講,這是可行的,因為我們可以FILTER
在視窗函式中嵌套聚合函式(甚至使用聚合子句)。相關答案: