Postgresql

事件數組對的聚合查詢

  • August 9, 2022

db<>fiddle用於下面的所有數據和查詢

我有一個events具有以下結構的表:

create table events (
   correlation_id char(26) not null,
   user_id        bigint,
   task_id        bigint not null,
   location_id    bigint,
   type           bigint not null,
   created_at     timestamp(6) with time zone not null,
   constraint events_correlation_id_created_at_user_id_unique
       unique (correlation_id, created_at, user_id)
);

此表包含正在執行的任務的記錄,如下所示:

帶有 的行type = 0表示任務的開始,帶有 的行type = 99表示任務的結束。(其他值表示與此問題無關的其他內容,但為了完整起見,此處包含兩個範例行。)

每個task_id對應於tasks表中的一行。任務表中唯一與該問題相關的其他欄位稱為inprogress_status,它可以是1or 2,分別表示Opening taskClosing task

我最初被要求提供一個查詢,該查詢將返回按開始日期和位置排序的任務列表,其中一行包含每個任務的開始 ( type = 0) 和結束 ( type = 99)。

這是我曾經這樣做的查詢:

SELECT e.created_at::DATE, e.location_id, e.task_id
    , CASE t.inprogress_status WHEN 2 THEN 'CLOSE' WHEN 1 THEN 'OPEN' END AS task_type
    , e.correlation_id
    , json_object_agg(e.type, json_build_object('timestamp', e.created_at, 'user_id', e.user_id)) AS events
FROM events e
JOIN tasks t on e.task_id = t.id
WHERE e.type IN (0, 99)
AND t.inprogress_status IN (1, 2)
group by created_at::DATE, location_id, task_id, correlation_id, inprogress_status
ORDER BY 1, 2, 3;

這是使用上面顯示的數據進行查詢的結果:

created_atlocation_idtask_id任務類型相關標識事件
2018-08-173058打開01CN4HP4AN0000000000000001{"0": {"timestamp": "2018-08-17T18:17:15.348629+00:00", "user_id": 4}, "99": {"timestamp": "2018-08-17T18:17:25.535593+00:00", "user_id": 4} }
2018-08-17309701CN4J9SZ80000000000000003{"0": {"timestamp": "2018-08-17T18:28:00.104093+00:00", "user_id": 4}, "99": {"timestamp": "2018-08-17T18:29:09.01684+00:00", "user_id": 4} }
2018-08-173099打開01CN4JC1430000000000000004{ "0": {"timestamp": "2018-08-17T18:29:12.963264+00:00", "user_id": 4}, "99": {"timestamp": "2018-08-17T18:32:09.272632+00:00", "user_id": 4} }
2018-08-17309701CN4KJCDY0000000000000005{ "0": {"timestamp": "2018-08-17T18:50:09.725668+00:00", "user_id": 139}, "99": {"timestamp": "2018-08-17T18:51:42.240895+00:00", "user_id": 139} }
2018-08-203099打開01CNC4G1Y40000000000000008{ "0": {"timestamp": "2018-08-20T17:00:40.26043+00:00", "user_id": 139}, "99" : {"timestamp": "2018-08-20T17:00:47.583501+00:00", "user_id" : 139} }

在上面的例子中,task_id 58and 99haveinprogress_status = 1task_id 97has inprogress_status = 2

現在我被要求修改返回的資料結構,以便它也可以聚合inprogress_status,並將行作為 OPEN+CLOSE 事件對返回。

為了弄清楚如何建構它,我首先嘗試獲取這種格式(我真正想要的最終格式如下):

created_atlocation_id事件
2018-08-1730{"OPEN": [{"correlation_id": "01CN4HP4AN0000000000000001", "0" : {"timestamp" : "2018-08-17T18:17:15.348629+00:00", "user_id" : 4}, "99" : {"timestamp" : "2018-08-17T18:17:25.535593+00:00", "user_id" : 4} }, {"OPEN": {"correlation_id": "01CN4JC1430000000000000004", "0" : {"timestamp" : "2018-08-17T18:29:12.963264+00:00", "user_id" : 4}, "99" : {"timestamp" : "2018-08-17T18:32:09.272632+00:00", "user_id" : 4} }], "CLOSE": [{"correlation_id": "01CN4J9SZ80000000000000003", "0" : {"timestamp" : "2018-08-17T18:28:00.104093+00:00", "user_id" : 4}, "99" : {"timestamp" : "2018-08-17T18:29:09.01684+00:00", "user_id" : 4} }, { "correlation_id": "01CN4KJCDY0000000000000005", "0" : {"timestamp" : "2018-08-17T18:50:09.725668+00:00", "user_id" : 139}, "99" : {"timestamp" : "2018-08-17T18:51:42.240895+00:00", "user_id" : 139} }]}
2018-08-2030{"OPEN": [{"correlation_id": "01CNC4G1Y40000000000000008", "0" : {"timestamp" : "2018-08-20T17:00:40.26043+00:00", "user_id" : 139}, "99" : {"timestamp" : "2018-08-20T17:00:47.583501+00:00", "user_id" : 139} }], "CLOSE": null}

這是我寫的第一個查詢,試圖使這項工作:

WITH grouped_events AS (
   SELECT e.created_at::DATE AS created_date,
       location_id,
       task_id,
       CASE t.inprogress_status WHEN 2 THEN 'CLOSE' WHEN 1 THEN 'OPEN' END AS task_type,
       jsonb_build_object('id', e.correlation_id) ||
               jsonb_object_agg(type, jsonb_build_object('timestamp', e.created_at, 'user_id', user_id)) AS events
   FROM events e
   JOIN tasks t on e.task_id = t.id
   WHERE type IN (0, 99)
   AND inprogress_status IN (1, 2)
   GROUP BY e.created_at::DATE, location_id, task_id, correlation_id, t.inprogress_status
)
SELECT created_date, location_id, json_object_agg(task_type, events)
FROM grouped_events
GROUP BY 1, 2
ORDER BY 1, 2

問題是這會產生無效的 JSON。具有多個相同的鍵:

{
   "OPEN": {
       "0": { "user_id": 4, "timestamp": "2018-08-17T18:29:12.963264+00:00" },
       "99": { "user_id": 4, "timestamp": "2018-08-17T18:32:09.272632+00:00" },
       "id": "01CN4JC1430000000000000004"
   },
   "OPEN": {
       "0": { "user_id": 4, "timestamp": "2018-08-17T18:17:15.348629+00:00" },
       "99": { "user_id": 4, "timestamp": "2018-08-17T18:17:25.535593+00:00" },
       "id": "01CN4HP4AN0000000000000001"
   },
   // ... etc.
}

我發現這個查詢以上面顯示的格式返回數據:

WITH grouped_events1 AS (
   SELECT e.created_at::DATE AS created_date,
       location_id,
       task_id,
       CASE t.inprogress_status WHEN 2 THEN 'CLOSE' WHEN 1 THEN 'OPEN' END AS task_type,
       jsonb_build_object('id', e.correlation_id) ||
               jsonb_object_agg(type, jsonb_build_object('timestamp', e.created_at, 'user_id', user_id)) AS events
   FROM events e
   JOIN tasks t on e.task_id = t.id
   WHERE type IN (0, 99)
   AND inprogress_status IN (1, 2)
   GROUP BY e.created_at::DATE, location_id, task_id, correlation_id, t.inprogress_status
), grouped_events2 AS (
   SELECT created_date, location_id, task_type, json_agg(events) AS events
   FROM grouped_events1
   GROUP BY 1, 2, 3
)
SELECT created_date, location_id, json_object_agg(task_type, events)
FROM grouped_events2
GROUP BY 1, 2
ORDER BY 1, 2

但是,我實際需要的格式應該只是將單個 OPEN 與單個 CLOSE 配對,如下所示(每個 OPEN 和緊隨其後的 CLOSE):

created_atlocation_id事件
2018-08-1730{"OPEN": {"correlation_id": "01CN4HP4AN0000000000000001", "0" : {"timestamp" : "2018-08-17T18:17:15.348629+00:00", "user_id" : 4}, "99" : {"timestamp" : "2018-08-17T18:17:25.535593+00:00", "user_id" : 4} }, "CLOSE": {"correlation_id": "01CN4J9SZ80000000000000003", "0" : {"timestamp" : "2018-08-17T18:28:00.104093+00:00", "user_id" : 4}, "99" : {"timestamp" : "2018-08-17T18:29:09.01684+00:00", "user_id" : 4} }}
2018-08-1730{"OPEN": {"OPEN": {"correlation_id": "01CN4JC1430000000000000004", "0" : {"timestamp" : "2018-08-17T18:29:12.963264+00:00", "user_id" : 4}, "99" : {"timestamp" : "2018-08-17T18:32:09.272632+00:00", "user_id" : 4} }, "CLOSE": { "correlation_id": "01CN4KJCDY0000000000000005", "0" : {"timestamp" : "2018-08-17T18:50:09.725668+00:00", "user_id" : 139}, "99" : {"timestamp" : "2018-08-17T18:51:42.240895+00:00", "user_id" : 139} }}
2018-08-2030{"OPEN": [{"correlation_id": "01CNC4G1Y40000000000000008", "0" : {"timestamp" : "2018-08-20T17:00:40.26043+00:00", "user_id" : 139}, "99" : {"timestamp" : "2018-08-20T17:00:47.583501+00:00", "user_id" : 139} }], "CLOSE": null}

現在我想弄清楚我是否走錯了方向,因為我看不出如何從我所擁有的東西中得到我的最終格式。

我接近這個錯誤嗎?我怎樣才能得到我正在尋找的結果?

這會產生您想要的結果:

SELECT the_day, location_id
    , jsonb_object_agg(task_type, events || jsonb_build_object('correlation_id', correlation_id)) AS events
FROM  (
  SELECT e.created_at::date AS the_day, e.location_id, e.correlation_id
       , count(*) FILTER (WHERE t.inprogress_status = 1)
                  OVER (PARTITION BY e.location_id ORDER BY min(e.created_at) FILTER (WHERE e.type = 0)) AS task_nr
       , CASE t.inprogress_status WHEN 2 THEN 'CLOSE' WHEN 1 THEN 'OPEN' END AS task_type     
       , jsonb_object_agg(e.type, jsonb_build_object('timestamp', e.created_at, 'user_id', e.user_id)) AS events
  FROM   events e
  JOIN   tasks t on e.task_id = t.id
  WHERE  e.type IN (0, 99)
  AND    t.inprogress_status IN (1, 2)
  GROUP  BY 1, 2, e.correlation_id, t.inprogress_status
  ) sub
GROUP  BY the_day, location_id, task_nr
ORDER  BY the_day, location_id, task_nr;

db<>在這裡擺弄

除了在一天開始時缺少“OPEN”事件和在最後缺少“CLOSE”事件之外只是缺少。

我使用jsonb而不是json允許jsonb || jsonb操作員json如果您確實需要,您可以將結果轉換為。

核心特徵是形成任務編號的複雜表達式:

   , count(*) FILTER (WHERE t.inprogress_status = 1)
              OVER (PARTITION BY e.location_id ORDER BY min(e.created_at) FILTER (WHERE e.type = 0)) AS task_nr

每個“打開”任務都會啟動一個新組。created_atwith定義了任務的type = 0順序。從技術上講,這是可行的,因為我們可以FILTER在視窗函式中嵌套聚合函式(甚至使用聚合子句)。

相關答案:

引用自:https://dba.stackexchange.com/questions/315379