Postgresql
為什麼 OR 語句比 UNION 慢?
數據庫版本:PostgreSQL 12.6
我有一張有 600,000 條記錄的表。
該表具有以下列:
- 名稱(varchar)
- location_type (int) 列舉值:(1,2,3)
- 祖先(varchar)
索引:
- 祖先(btree)
祖先列是一種建構樹的方法,其中每一行都有一個祖先,其中包含由“/”分隔的所有父 ID。
考慮以下範例:
以下查詢需要 686 毫秒才能執行:
SELECT * FROM geolocations WHERE EXISTS ( SELECT 1 FROM geolocations g2 WHERE g2.ancestry = CONCAT(geolocations.ancestry, '/', geolocations.id) )
此查詢在 808 毫秒內執行:
SELECT * FROM geolocations WHERE location_type = 2
將這兩個查詢與 OR 結合使用時,如果完成,大約需要 4475 毫秒才能完成。
SELECT * FROM geolocations WHERE EXISTS ( SELECT 1 FROM geolocations g2 WHERE g2.ancestry = CONCAT(geolocations.ancestry, '/', geolocations.id) ) OR location_type = 2
解釋:
[ { "Plan": { "Node Type": "Seq Scan", "Parallel Aware": false, "Relation Name": "geolocations", "Alias": "geolocations", "Startup Cost": 0, "Total Cost": 2760473.54, "Plan Rows": 582910, "Plan Width": 68, "Filter": "((SubPlan 1) OR (location_type = 2))", "Plans": [ { "Node Type": "Index Only Scan", "Parent Relationship": "SubPlan", "Subplan Name": "SubPlan 1", "Parallel Aware": false, "Scan Direction": "Forward", "Index Name": "index_geolocations_on_ancestry", "Relation Name": "geolocations", "Alias": "g2", "Startup Cost": 0.43, "Total Cost": 124.91, "Plan Rows": 30, "Plan Width": 0, "Index Cond": "(ancestry = concat(geolocations.ancestry, '/', geolocations.id))" } ] }, "JIT": { "Worker Number": -1, "Functions": 8, "Options": { "Inlining": true, "Optimization": true, "Expressions": true, "Deforming": true } } } ]
將它們與並集結合需要 1916 毫秒:
SELECT * FROM geolocations WHERE EXISTS ( SELECT 1 FROM geolocations g2 WHERE g2.ancestry = CONCAT(geolocations.ancestry, '/', geolocations.id) ) UNION SELECT * FROM geolocations WHERE location_type = 2
解釋
[ { "Plan": { "Node Type": "Unique", "Parallel Aware": false, "Startup Cost": 308693.44, "Total Cost": 332506.74, "Plan Rows": 865938, "Plan Width": 188, "Plans": [ { "Node Type": "Sort", "Parent Relationship": "Outer", "Parallel Aware": false, "Startup Cost": 308693.44, "Total Cost": 310858.29, "Plan Rows": 865938, "Plan Width": 188, "Sort Key": [ "geolocations.id", "geolocations.name", "geolocations.location_type", "geolocations.pricing", "geolocations.ancestry", "geolocations.geolocationable_id", "geolocations.geolocationable_type", "geolocations.created_at", "geolocations.updated_at", "geolocations.info" ], "Plans": [ { "Node Type": "Append", "Parent Relationship": "Outer", "Parallel Aware": false, "Startup Cost": 15851.41, "Total Cost": 63464.05, "Plan Rows": 865938, "Plan Width": 188, "Subplans Removed": 0, "Plans": [ { "Node Type": "Hash Join", "Parent Relationship": "Member", "Parallel Aware": false, "Join Type": "Inner", "Startup Cost": 15851.41, "Total Cost": 35074.94, "Plan Rows": 299882, "Plan Width": 68, "Inner Unique": true, "Hash Cond": "(concat(geolocations.ancestry, '/', geolocations.id) = (g2.ancestry)::text)", "Plans": [ { "Node Type": "Seq Scan", "Parent Relationship": "Outer", "Parallel Aware": false, "Relation Name": "geolocations", "Alias": "geolocations", "Startup Cost": 0, "Total Cost": 13900.63, "Plan Rows": 599763, "Plan Width": 68 }, { "Node Type": "Hash", "Parent Relationship": "Inner", "Parallel Aware": false, "Startup Cost": 15600.65, "Total Cost": 15600.65, "Plan Rows": 20061, "Plan Width": 12, "Plans": [ { "Node Type": "Aggregate", "Strategy": "Hashed", "Partial Mode": "Simple", "Parent Relationship": "Outer", "Parallel Aware": false, "Startup Cost": 15400.04, "Total Cost": 15600.65, "Plan Rows": 20061, "Plan Width": 12, "Group Key": [ "(g2.ancestry)::text" ], "Plans": [ { "Node Type": "Seq Scan", "Parent Relationship": "Outer", "Parallel Aware": false, "Relation Name": "geolocations", "Alias": "g2", "Startup Cost": 0, "Total Cost": 13900.63, "Plan Rows": 599763, "Plan Width": 12 } ] } ] } ] }, { "Node Type": "Seq Scan", "Parent Relationship": "Member", "Parallel Aware": false, "Relation Name": "geolocations", "Alias": "geolocations_1", "Startup Cost": 0, "Total Cost": 15400.04, "Plan Rows": 566056, "Plan Width": 68, "Filter": "(location_type = 2)" } ] } ] } ] }, "JIT": { "Worker Number": -1, "Functions": 15, "Options": { "Inlining": false, "Optimization": false, "Expressions": true, "Deforming": true } } } ]
為什麼 PostgreSQL 執行 OR 查詢要慢得多?
PostgreSQL 和許多其他 RDBMS 經常與
OR
謂詞作鬥爭。經常發生並且在這種情況下發生的情況是,編譯器決定它無法
OR
通過一次查找來實現這兩個條件,而是掃描整個索引,評估每一行的兩個(或更多)條件。儘管索引聯合的方法更為明顯(對人類而言),但還是如此。
您正在做的是一個非常常見的技巧,可以幫助編譯器並強制索引聯合。它現在完全分開評估兩側,在這種情況下速度要快得多。
它可能並不總是更快,例如,如果
location_type = 2
是表的很大一部分。當兩個條件的性能差異很大時,好處就更加明顯了。例如,
WHERE id = @id OR someName = @name
第一個條件是對單行的直接搜尋,而第二個條件是對幾行的搜尋。編譯器無法通過一次查找來滿足這一點,因此它經常跳轉到掃描整個表。索引聯合在這裡有幫助,因為您可以使用一個索引id
和另一個索引someName