Postgres 組和聚合（總和）JSONb 數組和非 JSONb 屬性

June 16, 2020

x86_64-pc-linux-gnu 上的 PostgreSQL 11.7版本 ，由 gcc 編譯，224fe214a p 3971489d3e，64 位
我正在嘗試建構一個查詢，該查詢（明確地）對來自 JSONb 列的值進行分組，並對來自其他列和非 JSONb 列的值求和。
表定義（我已經刪除了其他不相關的列）
id varchar(255) NOT NULL,
casualties jsonb NOT NULL,
involved_parties jsonb NULL,
tags jsonb NULL,
reported_at int8 NULL,
傷亡
每行都有一個對象，表示事件的跨類別傷亡人數。
{"police_deaths": 0, "civilian_deaths": 0, "criminal_deaths": 0, "military_deaths": 0, "police_injuries": 0, "emergency_deaths": 0, "civilian_injuries": 1, "criminal_injuries": 1, "military_injuries": 0, "emergency_injuries": 0}
參與方
這是一個對像數組。每行顯示零個或多個相關方（參與事件的人）。數據起初看起來有點誤導，因為對於數組中的每個條目，相關方/事件關係都有一個 ID。這並沒有真正給我們任何東西，我也不需要這個，但它目前在數據中。
[
 {"id": "2a0fd9dc-40bd-40dc-88ce-bc819fe9cdd8", "type": "group", "group": {"id": "6d342bfc-72c4-4588-ab95-1b3bdfb4881a", "name": "Naxals"}, "involvement": "Actor"}, 
 {"id": "dafc4726-3d3d-40cb-bbaf-63fa57250b44", "type": "group", "group": {"id": "18c6d3f6-c3eb-45db-9a02-26606f85d7eb", "name": "Indian Security Forces"}, "involvement": "Directly Targeted"}
]
這是我感興趣的小組和參與數據。
受影響的部門
這種結構很像相關方。
[
 {"id": "fcb952ef-3139-4fe7-ba15-7d800bdc60ae", "sector": {"id": "668d330e-aee5-4291-be98-df9c32b5b420", "name": "Military"}}, 
 {"id": "d1b71bae-29ac-48a2-ab41-a6979d720171", "sector": {"id": "550a4aa0-6d6f-4be2-ba33-f35d159ee686", "name": "Police/Law"}}
]
這是我感興趣的行業。
報告的_at
這是我們的分析師報告事件時的時代表示。
期望的輸出
對於查詢中的記錄，我想要一行。單行具有以下列：
incident_count,
casualties,
involved_parties,
tags,
min_reported_at,
max_reported_at
事件計數應該就是正式表示的行數。
傷亡對像在 JSON 中始終具有相同的屬性，我想對它們求和。因此，將有一個對象包含所有警察死亡、平民死亡等的總和。
對於有關各方和受影響的部門；每個都應該有一個數組，其中包含行中一組唯一的各方/部門
報告的最小值/最大值應該是所有行的最小值/最大值。
我從這個起點嘗試過：
select 
   jsonb_agg(incidents.affected_sectors) as affected_sectors,
   jsonb_agg(incidents.involved_parties) as involved_parties
from incidents
但這非常慢（9 秒）。因此，我嘗試將每個對象展開成一行，然後嘗試將其折疊回去，但結果迷失了方向。
我會很感激這裡的任何指示
謝謝，
標記。

好的，所以我有一個在可接受的時間範圍內發生的工作查詢。感覺很醜，所以如果有明顯的方法可以改進它，請告訴我。

with base_data as (
       /*This is where the query for incidents/static assets goes*/
       select affected_sectors, involved_parties, reported_at, tags, casualties 
       from incidents
       ------------------------------------------------------------
)
select  /*unique affected_sectors*/ 
       (
           select jsonb_agg(ssect.sector)
           from (
               select sect.sector
               from base_data,
                jsonb_to_recordset(base_data.affected_sectors) as sect(id varchar, sector jsonb)
               group by sect.sector
               ) ssect
       ) unique_sectors,
       /*unique involved parties*/
       (
           select jsonb_agg(spart.group)
           from    (
               select grp."group"
               from base_data,
               jsonb_to_recordset(base_data.involved_parties) as grp(id varchar, "type" varchar, "group" jsonb, involvement varchar)
               group by grp."group"
           ) spart
       ) unique_groups,
       /*min reported at date*/
       (
           select min(reported_at) from base_data 
       ) min_reported_at,
       /*max reported at date*/
       (
           select max(reported_at) from base_data 
       ) max_reported_at,
       /*unique tags*/
       (
           select jsonb_agg(stags.tags)
           from    (
               select value tags 
               from base_data, 
               jsonb_array_elements(base_data.tags) 
               group by value
           ) stags
       ) unique_tags,
       /*summary casualty counts*/
       (
           select json_object_agg(key, val)
           from (
               select key, sum(value::numeric) val
               from base_data cas, jsonb_each_text(cas.casualties)
               group by key
               ) scas
       ) casualty_counts,
       /*Incident Count*/
       (
           select count(1) from base_data
       ) incident_count

在我們的數據庫中，對於 10000 個事件，清除記憶體的執行時間約為 700 毫秒。我希望它低於 200 毫秒，並將繼續破解它。如果我想出任何更有用的東西，我會添加評論。

引用自：https://dba.stackexchange.com/questions/269202

Postgres 組和聚合（總和）JSONb 數組和非 JSONb 屬性

相關問答

如何根據條件對 JSON 數組中的嵌套值求和

將 Postgres 中的可變行數加入 json 數組

同一時間戳下多條數據的慢查詢

此 string_to_array 的唯一數組值

能夠根據事件查詢計算天數的 SQL 查詢

如何從 id 數組中返回多列？