為什麼 OR 語句比 UNION 慢？

September 21, 2021

數據庫版本：PostgreSQL 12.6

我有一張有 600,000 條記錄的表。

該表具有以下列：

名稱（varchar）
location_type (int) 列舉值：(1,2,3)
祖先（varchar）

索引：

祖先（btree）

祖先列是一種建構樹的方法，其中每一行都有一個祖先，其中包含由“/”分隔的所有父 ID。

考慮以下範例：

以下查詢需要 686 毫秒才能執行：

SELECT * FROM geolocations
WHERE EXISTS (
  SELECT 1 FROM geolocations g2
  WHERE g2.ancestry =
     CONCAT(geolocations.ancestry, '/', geolocations.id)
)

此查詢在 808 毫秒內執行：

SELECT * FROM geolocations
WHERE location_type = 2

將這兩個查詢與 OR 結合使用時，如果完成，大約需要 4475 毫秒才能完成。

SELECT * FROM geolocations
WHERE EXISTS (
  SELECT 1 FROM geolocations g2
  WHERE g2.ancestry =
     CONCAT(geolocations.ancestry, '/', geolocations.id)
) OR location_type = 2

解釋：

[
 {
   "Plan": {
     "Node Type": "Seq Scan",
     "Parallel Aware": false,
     "Relation Name": "geolocations",
     "Alias": "geolocations",
     "Startup Cost": 0,
     "Total Cost": 2760473.54,
     "Plan Rows": 582910,
     "Plan Width": 68,
     "Filter": "((SubPlan 1) OR (location_type = 2))",
     "Plans": [
       {
         "Node Type": "Index Only Scan",
         "Parent Relationship": "SubPlan",
         "Subplan Name": "SubPlan 1",
         "Parallel Aware": false,
         "Scan Direction": "Forward",
         "Index Name": "index_geolocations_on_ancestry",
         "Relation Name": "geolocations",
         "Alias": "g2",
         "Startup Cost": 0.43,
         "Total Cost": 124.91,
         "Plan Rows": 30,
         "Plan Width": 0,
         "Index Cond": "(ancestry = concat(geolocations.ancestry, '/', geolocations.id))"
       }
     ]
   },
   "JIT": {
     "Worker Number": -1,
     "Functions": 8,
     "Options": {
       "Inlining": true,
       "Optimization": true,
       "Expressions": true,
       "Deforming": true
     }
   }
 }
]

將它們與並集結合需要 1916 毫秒：

SELECT * FROM geolocations
WHERE EXISTS (
  SELECT 1 FROM geolocations g2
  WHERE g2.ancestry =
     CONCAT(geolocations.ancestry, '/', geolocations.id)
) UNION SELECT * FROM geolocations WHERE location_type = 2

解釋

[
 {
   "Plan": {
     "Node Type": "Unique",
     "Parallel Aware": false,
     "Startup Cost": 308693.44,
     "Total Cost": 332506.74,
     "Plan Rows": 865938,
     "Plan Width": 188,
     "Plans": [
       {
         "Node Type": "Sort",
         "Parent Relationship": "Outer",
         "Parallel Aware": false,
         "Startup Cost": 308693.44,
         "Total Cost": 310858.29,
         "Plan Rows": 865938,
         "Plan Width": 188,
         "Sort Key": [
           "geolocations.id",
           "geolocations.name",
           "geolocations.location_type",
           "geolocations.pricing",
           "geolocations.ancestry",
           "geolocations.geolocationable_id",
           "geolocations.geolocationable_type",
           "geolocations.created_at",
           "geolocations.updated_at",
           "geolocations.info"
         ],
         "Plans": [
           {
             "Node Type": "Append",
             "Parent Relationship": "Outer",
             "Parallel Aware": false,
             "Startup Cost": 15851.41,
             "Total Cost": 63464.05,
             "Plan Rows": 865938,
             "Plan Width": 188,
             "Subplans Removed": 0,
             "Plans": [
               {
                 "Node Type": "Hash Join",
                 "Parent Relationship": "Member",
                 "Parallel Aware": false,
                 "Join Type": "Inner",
                 "Startup Cost": 15851.41,
                 "Total Cost": 35074.94,
                 "Plan Rows": 299882,
                 "Plan Width": 68,
                 "Inner Unique": true,
                 "Hash Cond": "(concat(geolocations.ancestry, '/', geolocations.id) = (g2.ancestry)::text)",
                 "Plans": [
                   {
                     "Node Type": "Seq Scan",
                     "Parent Relationship": "Outer",
                     "Parallel Aware": false,
                     "Relation Name": "geolocations",
                     "Alias": "geolocations",
                     "Startup Cost": 0,
                     "Total Cost": 13900.63,
                     "Plan Rows": 599763,
                     "Plan Width": 68
                   },
                   {
                     "Node Type": "Hash",
                     "Parent Relationship": "Inner",
                     "Parallel Aware": false,
                     "Startup Cost": 15600.65,
                     "Total Cost": 15600.65,
                     "Plan Rows": 20061,
                     "Plan Width": 12,
                     "Plans": [
                       {
                         "Node Type": "Aggregate",
                         "Strategy": "Hashed",
                         "Partial Mode": "Simple",
                         "Parent Relationship": "Outer",
                         "Parallel Aware": false,
                         "Startup Cost": 15400.04,
                         "Total Cost": 15600.65,
                         "Plan Rows": 20061,
                         "Plan Width": 12,
                         "Group Key": [
                           "(g2.ancestry)::text"
                         ],
                         "Plans": [
                           {
                             "Node Type": "Seq Scan",
                             "Parent Relationship": "Outer",
                             "Parallel Aware": false,
                             "Relation Name": "geolocations",
                             "Alias": "g2",
                             "Startup Cost": 0,
                             "Total Cost": 13900.63,
                             "Plan Rows": 599763,
                             "Plan Width": 12
                           }
                         ]
                       }
                     ]
                   }
                 ]
               },
               {
                 "Node Type": "Seq Scan",
                 "Parent Relationship": "Member",
                 "Parallel Aware": false,
                 "Relation Name": "geolocations",
                 "Alias": "geolocations_1",
                 "Startup Cost": 0,
                 "Total Cost": 15400.04,
                 "Plan Rows": 566056,
                 "Plan Width": 68,
                 "Filter": "(location_type = 2)"
               }
             ]
           }
         ]
       }
     ]
   },
   "JIT": {
     "Worker Number": -1,
     "Functions": 15,
     "Options": {
       "Inlining": false,
       "Optimization": false,
       "Expressions": true,
       "Deforming": true
     }
   }
 }
]

為什麼 PostgreSQL 執行 OR 查詢要慢得多？

PostgreSQL 和許多其他 RDBMS 經常與OR謂詞作鬥爭。
經常發生並且在這種情況下發生的情況是，編譯器決定它無法OR通過一次查找來實現這兩個條件，而是掃描整個索引，評估每一行的兩個（或更多）條件。
儘管索引聯合的方法更為明顯（對人類而言），但還是如此。
您正在做的是一個非常常見的技巧，可以幫助編譯器並強制索引聯合。它現在完全分開評估兩側，在這種情況下速度要快得多。
它可能並不總是更快，例如，如果location_type = 2是表的很大一部分。當兩個條件的性能差異很大時，好處就更加明顯了。
例如，WHERE id = @id OR someName = @name第一個條件是對單行的直接搜尋，而第二個條件是對幾行的搜尋。編譯器無法通過一次查找來滿足這一點，因此它經常跳轉到掃描整個表。索引聯合在這裡有幫助，因為您可以使用一個索引id和另一個索引someName

引用自：https://dba.stackexchange.com/questions/293836

為什麼 OR 語句比 UNION 慢？

相關問答

存檔失敗，但從未生成新的 WAL

為什麼這個帶有 UNION 的 SQL 查詢明顯快於沒有 UNION 的相同查詢？

同一會話中的功能在同時執行多次時變得越來越慢

Postgresql：按初始表的欄位進行 LIKE 和分區

在 md5(JSONB_COLUMN::text) 上使用 UNIQUE 約束優化 INSERT 性能

選擇至少有一個連結到其他表的記錄