Postgresql
PostgreSQL 是否將帶有 OR 的 JOIN 重寫為 UNION?
(略)表匯總:
-- Table cases: id SERIAL PRIMARY KEY, application_number VARCHAR(30) NOT NULL, publication_number VARCHAR(30) NOT NULL, -- Table patents: case_id INTEGER PRIMARY KEY, -- FK to cases(id) pct_application_number VARCHAR(30) NOT NULL, pct_publication_number VARCHAR(30) NOT NULL, -- All character columns have working trigram indexes: CREATE INDEX cases_application_number_trgm_idx ON cases USING GIN (application_number gin_trgm_ops); -- (etc)
以下查詢很慢(~200ms),因為它不使用索引:
SELECT c.id FROM cases c JOIN patents p ON p.case_id = c.id WHERE c.application_number ILIKE '%1234%' OR p.pct_application_number ILIKE '%1234%'
以下查詢都很快(1-2ms):
-- AND instead of OR WHERE c.application_number ILIKE '%1234%' AND p.pct_application_number ILIKE '%1234%' -- OR, but only table "cases" WHERE c.application_number ILIKE '%1234%' OR c.publication_number ILIKE '%1234%' -- OR, but only table "patents" WHERE p.pct_application_number ILIKE '%1234%' OR p.pct_publication_number ILIKE '%1234%' -- Simulating the OR with a UNION SELECT c.id FROM cases c JOIN patents p ON p.case_id = c.id WHERE c.application_number ILIKE '%1234%' UNION SELECT c.id FROM cases c JOIN patents p ON p.case_id = c.id WHERE p.pct_application_number ILIKE '%1234%'
這是
EXPLAIN ANALYZE
慢查詢的輸出:QUERY PLAN ----------------------------------------------------------------------------------------------------------------------------- Hash Join (cost=2329.80..10590.54 rows=7 width=4) (actual time=54.951..186.713 rows=35 loops=1) Hash Cond: (c.id = p.case_id) Join Filter: (((c.application_number)::text ~~* '%1234%'::text) OR ((p.pct_application_number)::text ~~* '%1234%'::text)) Rows Removed by Join Filter: 68223 -> Seq Scan on cases c (cost=0.00..4981.99 rows=142099 width=12) (actual time=0.011..32.875 rows=142099 loops=1) -> Hash (cost=1142.58..1142.58 rows=68258 width=11) (actual time=31.105..31.105 rows=68258 loops=1) Buckets: 131072 Batches: 2 Memory Usage: 2473kB -> Seq Scan on patents p (cost=0.00..1142.58 rows=68258 width=11) (actual time=0.019..11.995 rows=68258 loops=1) Planning time: 1.875 ms Execution time: 186.780 ms (10 rows)
此處發布的查詢已大大減少以說明問題。實際查詢更複雜,涉及在五個(或更多)表中的六個(或更多)列中進行文本搜尋,大約有 10 個輸出列。我想我可以將所有這些重寫為一系列查詢並將它們連接成一個巨大的
UNION
……有沒有更好的方法來處理這個問題?添加
enable_seqscan
禁用的查詢計劃(根據要求):QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------- Merge Join (cost=0.71..18767.92 rows=7 width=4) (actual time=4.809..150.368 rows=35 loops=1) Merge Cond: (c.id = p.case_id) Join Filter: (((c.application_number)::text ~~* '%1234%'::text) OR ((p.pct_application_number)::text ~~* '%1234%'::text)) Rows Removed by Join Filter: 68223 -> Index Scan using cases_pkey on cases c (cost=0.42..14942.96 rows=142099 width=12) (actual time=0.004..32.695 rows=142097 loops=1) -> Index Scan using patents_pkey on patents p (cost=0.29..2275.63 rows=68258 width=11) (actual time=0.003..11.942 rows=68258 loops=1) Planning time: 1.007 ms Execution time: 150.399 ms (8 rows)
除了重寫為 UNION (或非規範化/重構數據)之外,恐怕現在對您沒有好的解決方案。
有一個建議將OR 自動轉換為 UNION,但它需要更多的測試和審查才能使其進入 v11
PostgreSQL
,但我已經驗證它確實適用於像你這樣的案例。