Postgresql

為什麼 OR 語句比 UNION 慢?

  • September 21, 2021

數據庫版本:PostgreSQL 12.6

我有一張有 600,000 條記錄的表。

該表具有以下列:

  • 名稱(varchar)
  • location_type (int) 列舉值:(1,2,3)
  • 祖先(varchar)

索引:

  • 祖先(btree)

祖先列是一種建構樹的方法,其中每一行都有一個祖先,其中包含由“/”分隔的所有父 ID。

考慮以下範例:

以下查詢需要 686 毫秒才能執行:

SELECT * FROM geolocations
WHERE EXISTS (
  SELECT 1 FROM geolocations g2
  WHERE g2.ancestry =
     CONCAT(geolocations.ancestry, '/', geolocations.id)
)

此查詢在 808 毫秒內執行:

SELECT * FROM geolocations
WHERE location_type = 2

將這兩個查詢與 OR 結合使用時,如果完成,大約需要 4475 毫秒才能完成。

SELECT * FROM geolocations
WHERE EXISTS (
  SELECT 1 FROM geolocations g2
  WHERE g2.ancestry =
     CONCAT(geolocations.ancestry, '/', geolocations.id)
) OR location_type = 2

解釋:

[
 {
   "Plan": {
     "Node Type": "Seq Scan",
     "Parallel Aware": false,
     "Relation Name": "geolocations",
     "Alias": "geolocations",
     "Startup Cost": 0,
     "Total Cost": 2760473.54,
     "Plan Rows": 582910,
     "Plan Width": 68,
     "Filter": "((SubPlan 1) OR (location_type = 2))",
     "Plans": [
       {
         "Node Type": "Index Only Scan",
         "Parent Relationship": "SubPlan",
         "Subplan Name": "SubPlan 1",
         "Parallel Aware": false,
         "Scan Direction": "Forward",
         "Index Name": "index_geolocations_on_ancestry",
         "Relation Name": "geolocations",
         "Alias": "g2",
         "Startup Cost": 0.43,
         "Total Cost": 124.91,
         "Plan Rows": 30,
         "Plan Width": 0,
         "Index Cond": "(ancestry = concat(geolocations.ancestry, '/', geolocations.id))"
       }
     ]
   },
   "JIT": {
     "Worker Number": -1,
     "Functions": 8,
     "Options": {
       "Inlining": true,
       "Optimization": true,
       "Expressions": true,
       "Deforming": true
     }
   }
 }
]

將它們與並集結合需要 1916 毫秒:

SELECT * FROM geolocations
WHERE EXISTS (
  SELECT 1 FROM geolocations g2
  WHERE g2.ancestry =
     CONCAT(geolocations.ancestry, '/', geolocations.id)
) UNION SELECT * FROM geolocations WHERE location_type = 2

解釋

[
 {
   "Plan": {
     "Node Type": "Unique",
     "Parallel Aware": false,
     "Startup Cost": 308693.44,
     "Total Cost": 332506.74,
     "Plan Rows": 865938,
     "Plan Width": 188,
     "Plans": [
       {
         "Node Type": "Sort",
         "Parent Relationship": "Outer",
         "Parallel Aware": false,
         "Startup Cost": 308693.44,
         "Total Cost": 310858.29,
         "Plan Rows": 865938,
         "Plan Width": 188,
         "Sort Key": [
           "geolocations.id",
           "geolocations.name",
           "geolocations.location_type",
           "geolocations.pricing",
           "geolocations.ancestry",
           "geolocations.geolocationable_id",
           "geolocations.geolocationable_type",
           "geolocations.created_at",
           "geolocations.updated_at",
           "geolocations.info"
         ],
         "Plans": [
           {
             "Node Type": "Append",
             "Parent Relationship": "Outer",
             "Parallel Aware": false,
             "Startup Cost": 15851.41,
             "Total Cost": 63464.05,
             "Plan Rows": 865938,
             "Plan Width": 188,
             "Subplans Removed": 0,
             "Plans": [
               {
                 "Node Type": "Hash Join",
                 "Parent Relationship": "Member",
                 "Parallel Aware": false,
                 "Join Type": "Inner",
                 "Startup Cost": 15851.41,
                 "Total Cost": 35074.94,
                 "Plan Rows": 299882,
                 "Plan Width": 68,
                 "Inner Unique": true,
                 "Hash Cond": "(concat(geolocations.ancestry, '/', geolocations.id) = (g2.ancestry)::text)",
                 "Plans": [
                   {
                     "Node Type": "Seq Scan",
                     "Parent Relationship": "Outer",
                     "Parallel Aware": false,
                     "Relation Name": "geolocations",
                     "Alias": "geolocations",
                     "Startup Cost": 0,
                     "Total Cost": 13900.63,
                     "Plan Rows": 599763,
                     "Plan Width": 68
                   },
                   {
                     "Node Type": "Hash",
                     "Parent Relationship": "Inner",
                     "Parallel Aware": false,
                     "Startup Cost": 15600.65,
                     "Total Cost": 15600.65,
                     "Plan Rows": 20061,
                     "Plan Width": 12,
                     "Plans": [
                       {
                         "Node Type": "Aggregate",
                         "Strategy": "Hashed",
                         "Partial Mode": "Simple",
                         "Parent Relationship": "Outer",
                         "Parallel Aware": false,
                         "Startup Cost": 15400.04,
                         "Total Cost": 15600.65,
                         "Plan Rows": 20061,
                         "Plan Width": 12,
                         "Group Key": [
                           "(g2.ancestry)::text"
                         ],
                         "Plans": [
                           {
                             "Node Type": "Seq Scan",
                             "Parent Relationship": "Outer",
                             "Parallel Aware": false,
                             "Relation Name": "geolocations",
                             "Alias": "g2",
                             "Startup Cost": 0,
                             "Total Cost": 13900.63,
                             "Plan Rows": 599763,
                             "Plan Width": 12
                           }
                         ]
                       }
                     ]
                   }
                 ]
               },
               {
                 "Node Type": "Seq Scan",
                 "Parent Relationship": "Member",
                 "Parallel Aware": false,
                 "Relation Name": "geolocations",
                 "Alias": "geolocations_1",
                 "Startup Cost": 0,
                 "Total Cost": 15400.04,
                 "Plan Rows": 566056,
                 "Plan Width": 68,
                 "Filter": "(location_type = 2)"
               }
             ]
           }
         ]
       }
     ]
   },
   "JIT": {
     "Worker Number": -1,
     "Functions": 15,
     "Options": {
       "Inlining": false,
       "Optimization": false,
       "Expressions": true,
       "Deforming": true
     }
   }
 }
]

為什麼 PostgreSQL 執行 OR 查詢要慢得多?

PostgreSQL 和許多其他 RDBMS 經常與OR謂詞作鬥爭。

經常發生並且在這種情況下發生的情況是,編譯器決定它無法OR通過一次查找來實現這兩個條件,而是掃描整個索引,評估每一行的兩個(或更多)條件。

儘管索引聯合的方法更為明顯(對人類而言),但還是如此。

您正在做的是一個非常常見的技巧,可以幫助編譯器並強制索引聯合。它現在完全分開評估兩側,在這種情況下速度要快得多。

它可能並不總是更快,例如,如果location_type = 2是表的很大一部分。當兩個條件的性能差異很大時,好處就更加明顯了。

例如,WHERE id = @id OR someName = @name第一個條件是對單行的直接搜尋,而第二個條件是對幾行的搜尋。編譯器無法通過一次查找來滿足這一點,因此它經常跳轉到掃描整個表。索引聯合在這裡有幫助,因為您可以使用一個索引id和另一個索引someName

引用自:https://dba.stackexchange.com/questions/293836