Postgresql

PostgreSQL 是否將帶有 OR 的 JOIN 重寫為 UNION?

  • October 9, 2017

(略)表匯總:

-- Table cases:
    id                  SERIAL       PRIMARY KEY,
    application_number  VARCHAR(30)  NOT NULL,
    publication_number  VARCHAR(30)  NOT NULL,

-- Table patents:
    case_id                 INTEGER      PRIMARY KEY,  -- FK to cases(id)
    pct_application_number  VARCHAR(30)  NOT NULL,
    pct_publication_number  VARCHAR(30)  NOT NULL,

-- All character columns have working trigram indexes:
    CREATE INDEX cases_application_number_trgm_idx ON cases
        USING GIN (application_number gin_trgm_ops);
        -- (etc)

以下查詢很慢(~200ms),因為它不使用索引:

SELECT  c.id
 FROM  cases c
       JOIN patents p ON p.case_id = c.id
WHERE  c.application_number ILIKE '%1234%' OR p.pct_application_number ILIKE '%1234%'

以下查詢都很快(1-2ms):

-- AND instead of OR
WHERE  c.application_number ILIKE '%1234%' AND p.pct_application_number ILIKE '%1234%'

-- OR, but only table "cases"
WHERE  c.application_number ILIKE '%1234%' OR c.publication_number ILIKE '%1234%'

-- OR, but only table "patents"
WHERE  p.pct_application_number ILIKE '%1234%' OR p.pct_publication_number ILIKE '%1234%'

-- Simulating the OR with a UNION
SELECT  c.id
 FROM  cases c
       JOIN patents p ON p.case_id = c.id
WHERE  c.application_number ILIKE '%1234%'
UNION
SELECT  c.id
 FROM  cases c
       JOIN patents p ON p.case_id = c.id
WHERE  p.pct_application_number ILIKE '%1234%'

這是EXPLAIN ANALYZE慢查詢的輸出:

                                                        QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------
Hash Join  (cost=2329.80..10590.54 rows=7 width=4) (actual time=54.951..186.713 rows=35 loops=1)
  Hash Cond: (c.id = p.case_id)
  Join Filter: (((c.application_number)::text ~~* '%1234%'::text) OR ((p.pct_application_number)::text ~~* '%1234%'::text))
  Rows Removed by Join Filter: 68223
  ->  Seq Scan on cases c  (cost=0.00..4981.99 rows=142099 width=12) (actual time=0.011..32.875 rows=142099 loops=1)
  ->  Hash  (cost=1142.58..1142.58 rows=68258 width=11) (actual time=31.105..31.105 rows=68258 loops=1)
        Buckets: 131072  Batches: 2  Memory Usage: 2473kB
        ->  Seq Scan on patents p  (cost=0.00..1142.58 rows=68258 width=11) (actual time=0.019..11.995 rows=68258 loops=1)
Planning time: 1.875 ms
Execution time: 186.780 ms
(10 rows)

此處發布的查詢已大大減少以說明問題。實際查詢更複雜,涉及在五個(或更多)表中的六個(或更多)列中進行文本搜尋,大約有 10 個輸出列。我想我可以將所有這些重寫為一系列查詢並將它們連接成一個巨大的UNION……有沒有更好的方法來處理這個問題?


添加enable_seqscan禁用的查詢計劃(根據要求):

                                                               QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------
Merge Join  (cost=0.71..18767.92 rows=7 width=4) (actual time=4.809..150.368 rows=35 loops=1)
  Merge Cond: (c.id = p.case_id)
  Join Filter: (((c.application_number)::text ~~* '%1234%'::text) OR ((p.pct_application_number)::text ~~* '%1234%'::text))
  Rows Removed by Join Filter: 68223
  ->  Index Scan using cases_pkey on cases c  (cost=0.42..14942.96 rows=142099 width=12) (actual time=0.004..32.695 rows=142097 loops=1)
  ->  Index Scan using patents_pkey on patents p  (cost=0.29..2275.63 rows=68258 width=11) (actual time=0.003..11.942 rows=68258 loops=1)
Planning time: 1.007 ms
Execution time: 150.399 ms
(8 rows)

除了重寫為 UNION (或非規範化/重構數據)之外,恐怕現在對您沒有好的解決方案。

有一個建議將OR 自動轉換為 UNION,但它需要更多的測試和審查才能使其進入 v11 PostgreSQL,但我已經驗證它確實適用於像你這樣的案例。

引用自:https://dba.stackexchange.com/questions/188029