Postgresql

表設計和查詢優化:查詢從工作項列表中找到合適的工作

  • August 6, 2019

我有一個帶有jsonb列的表,如下所示

CREATE TABLE
   work
   (
       id SERIAL NOT NULL,
       work_data JSONB
   );

樣本數據如下:

100 {"work_id": [7245, 3991, 3358, 1028]}

我為 work_id 創建了一個杜松子酒索引,如下所示:

CREATE INDEX idzworkdata ON work USING gin ((work_data -> 'work_id'));

Postgres 文件說 gin 索引適用於@>遏制操作員。但是我需要找到所有具有使用者輸入的 work_id 的工作記錄,為此我需要使用<@operator.

連結到 postgres 文件: https ://www.postgresql.org/docs/current/datatype-json.html

第 8.14.4 節

“jsonb 的預設 GIN 運算符類支持使用 @>、?、?& 和 ?| 運算符進行查詢。(有關這些運算符實現的語義的詳細資訊,請參見表 9-41。)使用此創建索引的範例運算符類是"

當我執行以下查詢時:

select *  
from public.work
where  work_json ->'skill'  <@ '[ 3587, 3422,7250, 458 ]'

執行計劃:

Gather  (cost=1000.00..246319.01 rows=10000 width=114) (actual time=0.568..2647.415 rows=1 loops=1)                          
 Workers Planned: 2                                                                                                         
 Workers Launched: 2                                                                                                        
 ->  Parallel Seq Scan on work  (cost=0.00..244319.01 rows=4167 width=114) (actual time=1746.766..2627.820 rows=0 loops=3)  
       Filter: ((work_json -> 'skill'::text) <@ '[3587, 3422, 7250, 458]'::jsonb)                                           
       Rows Removed by Filter: 3333333                                                                                      
Planning Time: 1.456 ms                                                                                                      
Execution Time: 2647.470 ms

該查詢不使用 gin 索引。有什麼解決方法可以用來為<@操作員使用 gin 索引嗎?

更新 2:

不特定於 postgres 的方法:

查詢大約需要 40 到 50 秒,這是巨大的

我用過兩張桌子

CREATE TABLE public.work
(
   id integer NOT NULL DEFAULT nextval('work_id_seq'::regclass),
   work_data_id integer[],
   work_json jsonb
)

CREATE TABLE public.work_data
(
   work_data_id bigint,
   work_id bigint
)

詢問:

select work.id 
from work  
  inner join work_data on (work.id=work_data.work_id) 
group by work.id 
having sum(case when work_data.work_data_id in (2269,3805,828,9127) then 0 else 1 end)=0 
Finalize GroupAggregate  (cost=3618094.30..6459924.90 rows=50000 width=4) (actual time=41891.301..64750.815 rows=1 loops=1)                                      
 Group Key: work.id                                                                                                                                             
 Filter: (sum(CASE WHEN (work_data.work_data_id = ANY ('{2269,3805,828,9127}'::bigint[])) THEN 0 ELSE 1 END) = 0)                                               
 Rows Removed by Filter: 9999999                                                                                                                                
 ->  Gather Merge  (cost=3618094.30..6234924.88 rows=20000002 width=12) (actual time=41891.217..58887.351 rows=10000581 loops=1)                                
       Workers Planned: 2                                                                                                                                       
       Workers Launched: 2                                                                                                                                      
       ->  Partial GroupAggregate  (cost=3617094.28..3925428.38 rows=10000001 width=12) (actual time=41792.169..53183.859 rows=3333527 loops=3)                 
             Group Key: work.id                                                                                                                                 
             ->  Sort  (cost=3617094.28..3658761.10 rows=16666727 width=12) (actual time=41792.125..45907.253 rows=13333333 loops=3)                            
                   Sort Key: work.id                                                                                                                            
                   Sort Method: external merge  Disk: 339000kB                                                                                                  
                   Worker 0:  Sort Method: external merge  Disk: 338992kB                                                                                       
                   Worker 1:  Sort Method: external merge  Disk: 339784kB                                                                                       
                   ->  Parallel Hash Join  (cost=291846.01..1048214.42 rows=16666727 width=12) (actual time=13844.982..23748.244 rows=13333333 loops=3)         
                         Hash Cond: (work_data.work_id = work.id)                                                                                               
                         ->  Parallel Seq Scan on work_data  (cost=0.00..382884.27 rows=16666727 width=16) (actual time=0.020..4094.341 rows=13333333 loops=3)  
                         ->  Parallel Hash  (cost=223485.67..223485.67 rows=4166667 width=4) (actual time=3345.351..3345.351 rows=3333334 loops=3)              
                               Buckets: 131072  Batches: 256  Memory Usage: 2592kB                                                                              
                               ->  Parallel Seq Scan on work  (cost=0.00..223485.67 rows=4166667 width=4) (actual time=0.182..1603.437 rows=3333334 loops=3)    
Planning Time: 1.544 ms                                                                                                                                          
Execution Time: 65503.341 ms 

注意:小背景:work表格包含執行工作所需的工作詳細資訊和相應的工作 ID。每個使用者都可以執行某些工作 ID,這些工作 ID 比任何工作的工作 ID 都超級設置。所以使用者總是有更多的工作ID。我嘗試將工作表和工作 ID 列表表作為單獨的表進行正常的聯接查詢,但查詢正在進行表掃描,大約需要 40 秒,這非常大。

jsonb您可以使用將數組轉換為數組的輔助函式integer

CREATE FUNCTION jsonarr2intarr(text) RETURNS int[]
  LANGUAGE sql IMMUTABLE AS
$$SELECT translate($1, '[]', '{}')::int[]$$;

這可以與索引一起使用:

CREATE INDEX ON work USING gin (jsonarr2intarr(work_data ->> 'work_id'));

修改後的查詢可以使用該索引:

EXPLAIN (COSTS OFF)
SELECT * FROM work
WHERE jsonarr2intarr(work_data ->> 'work_id')
     <@ ARRAY[1,2,3,5,6,11,7245,3991,3358,1028];

                                                       QUERY PLAN                                                        
--------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on work
  Recheck Cond: (jsonarr2intarr((work_data ->> 'work_id'::text)) <@ '{1,2,3,5,6,11,7245,3991,3358,1028}'::integer[])
  ->  Bitmap Index Scan on work_jsonarr2intarr_idx
        Index Cond: (jsonarr2intarr((work_data ->> 'work_id'::text)) <@ '{1,2,3,5,6,11,7245,3991,3358,1028}'::integer[])
(4 rows)

引用自:https://dba.stackexchange.com/questions/244565