Postgresql
表設計和查詢優化:查詢從工作項列表中找到合適的工作
我有一個帶有jsonb列的表,如下所示
CREATE TABLE work ( id SERIAL NOT NULL, work_data JSONB );
樣本數據如下:
100 {"work_id": [7245, 3991, 3358, 1028]}
我為 work_id 創建了一個杜松子酒索引,如下所示:
CREATE INDEX idzworkdata ON work USING gin ((work_data -> 'work_id'));
Postgres 文件說 gin 索引適用於
@>
遏制操作員。但是我需要找到所有具有使用者輸入的 work_id 的工作記錄,為此我需要使用<@
operator.連結到 postgres 文件: https ://www.postgresql.org/docs/current/datatype-json.html
第 8.14.4 節
“jsonb 的預設 GIN 運算符類支持使用 @>、?、?& 和 ?| 運算符進行查詢。(有關這些運算符實現的語義的詳細資訊,請參見表 9-41。)使用此創建索引的範例運算符類是"
當我執行以下查詢時:
select * from public.work where work_json ->'skill' <@ '[ 3587, 3422,7250, 458 ]'
執行計劃:
Gather (cost=1000.00..246319.01 rows=10000 width=114) (actual time=0.568..2647.415 rows=1 loops=1) Workers Planned: 2 Workers Launched: 2 -> Parallel Seq Scan on work (cost=0.00..244319.01 rows=4167 width=114) (actual time=1746.766..2627.820 rows=0 loops=3) Filter: ((work_json -> 'skill'::text) <@ '[3587, 3422, 7250, 458]'::jsonb) Rows Removed by Filter: 3333333 Planning Time: 1.456 ms Execution Time: 2647.470 ms
該查詢不使用 gin 索引。有什麼解決方法可以用來為
<@
操作員使用 gin 索引嗎?更新 2:
不特定於 postgres 的方法:
查詢大約需要 40 到 50 秒,這是巨大的
我用過兩張桌子
CREATE TABLE public.work ( id integer NOT NULL DEFAULT nextval('work_id_seq'::regclass), work_data_id integer[], work_json jsonb ) CREATE TABLE public.work_data ( work_data_id bigint, work_id bigint )
詢問:
select work.id from work inner join work_data on (work.id=work_data.work_id) group by work.id having sum(case when work_data.work_data_id in (2269,3805,828,9127) then 0 else 1 end)=0
Finalize GroupAggregate (cost=3618094.30..6459924.90 rows=50000 width=4) (actual time=41891.301..64750.815 rows=1 loops=1) Group Key: work.id Filter: (sum(CASE WHEN (work_data.work_data_id = ANY ('{2269,3805,828,9127}'::bigint[])) THEN 0 ELSE 1 END) = 0) Rows Removed by Filter: 9999999 -> Gather Merge (cost=3618094.30..6234924.88 rows=20000002 width=12) (actual time=41891.217..58887.351 rows=10000581 loops=1) Workers Planned: 2 Workers Launched: 2 -> Partial GroupAggregate (cost=3617094.28..3925428.38 rows=10000001 width=12) (actual time=41792.169..53183.859 rows=3333527 loops=3) Group Key: work.id -> Sort (cost=3617094.28..3658761.10 rows=16666727 width=12) (actual time=41792.125..45907.253 rows=13333333 loops=3) Sort Key: work.id Sort Method: external merge Disk: 339000kB Worker 0: Sort Method: external merge Disk: 338992kB Worker 1: Sort Method: external merge Disk: 339784kB -> Parallel Hash Join (cost=291846.01..1048214.42 rows=16666727 width=12) (actual time=13844.982..23748.244 rows=13333333 loops=3) Hash Cond: (work_data.work_id = work.id) -> Parallel Seq Scan on work_data (cost=0.00..382884.27 rows=16666727 width=16) (actual time=0.020..4094.341 rows=13333333 loops=3) -> Parallel Hash (cost=223485.67..223485.67 rows=4166667 width=4) (actual time=3345.351..3345.351 rows=3333334 loops=3) Buckets: 131072 Batches: 256 Memory Usage: 2592kB -> Parallel Seq Scan on work (cost=0.00..223485.67 rows=4166667 width=4) (actual time=0.182..1603.437 rows=3333334 loops=3) Planning Time: 1.544 ms Execution Time: 65503.341 ms
注意:小背景:
work
表格包含執行工作所需的工作詳細資訊和相應的工作 ID。每個使用者都可以執行某些工作 ID,這些工作 ID 比任何工作的工作 ID 都超級設置。所以使用者總是有更多的工作ID。我嘗試將工作表和工作 ID 列表表作為單獨的表進行正常的聯接查詢,但查詢正在進行表掃描,大約需要 40 秒,這非常大。
jsonb
您可以使用將數組轉換為數組的輔助函式integer
:CREATE FUNCTION jsonarr2intarr(text) RETURNS int[] LANGUAGE sql IMMUTABLE AS $$SELECT translate($1, '[]', '{}')::int[]$$;
這可以與索引一起使用:
CREATE INDEX ON work USING gin (jsonarr2intarr(work_data ->> 'work_id'));
修改後的查詢可以使用該索引:
EXPLAIN (COSTS OFF) SELECT * FROM work WHERE jsonarr2intarr(work_data ->> 'work_id') <@ ARRAY[1,2,3,5,6,11,7245,3991,3358,1028]; QUERY PLAN -------------------------------------------------------------------------------------------------------------------------- Bitmap Heap Scan on work Recheck Cond: (jsonarr2intarr((work_data ->> 'work_id'::text)) <@ '{1,2,3,5,6,11,7245,3991,3358,1028}'::integer[]) -> Bitmap Index Scan on work_jsonarr2intarr_idx Index Cond: (jsonarr2intarr((work_data ->> 'work_id'::text)) <@ '{1,2,3,5,6,11,7245,3991,3358,1028}'::integer[]) (4 rows)