Mysql

在 ON 子句中使用 OR 運算符加快 LEFT JOIN

  • July 3, 2021

我有一個查詢,它在 ON 子句中使用帶有 OR 運算符的 LEFT JOIN。註釋掉 OR 運算符時,查詢執行時間約為 150 毫秒。查詢中保留的 OR 運算符需要 80 多秒才能執行。有誰知道如何加快速度?詳情如下。

supplier_responses
+----+-------------+-------------+----------+-------+
| id | internal_id | supplier_id | supplier | price |
+----+-------------+-------------+----------+-------+
| 1  | 100         | 100         | poggle   | 10    |
| 2  | 101         | 101         | poggle   | 15    |
| 3  | 102         | 290         | lello    | 12    |
| 4  | 103         | 370         | chugs    | 10    |
| .. | ...         | ...         | ...      | ...   |
+----+-------------+-------------+----------+-------+
Indexes on internal_id, supplier_id, supplier.

supplier_updates
+----+-------------+-------------+----------+--------+--------------+
| id | internal_id | supplier_id | supplier | status | timestamp    |
+----+-------------+-------------+----------+--------+--------------+
| 1  | 100         | 100         | poggle   | 80     | 2019-01-15...|
| 2  | 100         | 100         | poggle   | 100    | 2019-01-16...|
| 3  | null        | 290         | lello    | 80     | 2019-01-16...|
| 4  | 107         | 107         | poggle   | 80     | 2019-01-17...|
| 5  | 112         | null        | chugs    | 100    | 2019-01-17...|
| 6  | null        | 100         | lello    | 100    | 2019-01-18...|
| .. | ...         | ...         | ...      | ...    | ...          |
+----+-------------+-------------+----------+--------+--------------+
Indexes on internal_id, supplier_id, supplier, timestamp.

對於“poggle” internal_id 和 supplier_id 將是相同的。

SELECT * 
FROM   supplier_responses sr 
      LEFT JOIN (SELECT * 
                 FROM   supplier_updates su 
                        LEFT JOIN (SELECT supplier supplier_name, 
                                          COALESCE(internal_id, supplier_id) latest_id, 
                                          Max(timestamp) latest_timestamp 
                                   FROM   supplier_updates 
                                   GROUP  BY supplier_name, 
                                             supplier_id) suLatest 
                               ON suLatest.supplier_name = su.supplier 
                                  AND suLatest.latest_timestamp = su.timestamp 
                                  AND suLatest.latest_id = 
                                      COALESCE(su.internal_id, 
                                      supplier_id) 
                ) su 
             ON ( sr.supplier = su.supplier 
                  AND sr.internal_id = su.internal_id ) 
                 OR ( sr.supplier = su.supplier 
                      AND sr.supplier_id = su.supplier_id ) 
ORDER  BY su.timestamp_at ASC;

像這樣的OR連接謂詞中的一個可以強制掃描,因為與函式的許多用途一樣,結果不是sargable

假設有適當的索引,您通常可以使用 aUNION來避免掃描,因此:

  SELECT * 
    FROM supplier_responses sr 
LEFT JOIN supplier_updates su 
      ON ( sr.supplier = su.supplier AND sr.internal_id = su.internal_id ) 
      OR ( sr.supplier = su.supplier AND sr.supplier_id = su.supplier_id ) 
ORDER  BY su.timestamp_at ASC;

變成:

  SELECT * 
    FROM supplier_responses sr 
LEFT JOIN supplier_updates su 
      ON ( sr.supplier = su.supplier AND sr.internal_id = su.internal_id ) 
   UNION
  SELECT * 
    FROM supplier_responses sr 
LEFT JOIN supplier_updates su 
      ON ( sr.supplier = su.supplier AND sr.supplier_id = su.supplier_id ) 
ORDER BY timestamp_at ASC;

這看起來需要更多的工作,但如果每個或選擇更快,因為它們被正確索引,OR否則會強制掃描索引或整個表,那麼數據庫引擎處理的工作將大大減少。

SELECT如果您知道兩個輸出之間不會有重複(兩者sr.supplier = su.supplier AND sr.internal_id = su.internal_idsr.supplier = su.supplier AND sr.supplier_id = su.supplier_id匹配的行),您可以進一步加快速度,UNION ALL而不是使用UNION它來節省額外的排序操作(這可能最終會假離線到磁碟)。

為簡潔起見,我稍微簡化了您的範例。您對派生表的子選擇su可能需要分解為 CTE 或視圖以避免重複該程式碼(這可能會在以後鼓勵部分編輯錯誤)。當然,子查詢本身及其子子查詢可能會導致性能問題,具體取決於 mySQL 查詢計劃器的亮度,特別是現在它被命中兩次,但我一直堅持專門處理這個JOIN ... ON ... OR ...問題。

  • 除非你真的需要,否則更改LEFT JOIN為.JOIN``LEFT
  • 清理諸如( sr.supplier_id = su.supplier_id AND sr.supplier_id = su.supplier_id )
  • 改成ORa UNION(有很多重寫)
  • 提供SHOW CREATE TABLE(以便我們查看您是否有合適的索引)
  • 不要忽略其他條件——它們可能是問題的一部分。
  • 改進漂亮的列印,也許

像這樣:

SELECT  *
   FROM  supplier_responses sr
   LEFT JOIN  
       (
       SELECT  *
           FROM  supplier_updates su
           LEFT JOIN  
               (
               SELECT  supplier_id supplierId,
                       coalesce(internal_id, supplier_id) latest_id,
                       Max(timestamp) latest_timestamp
                   FROM  supplier_updates
                   GROUP BY  supplier_id, supplier_id
               ) suLatest  ON suLatest.supplierId = su.supplier_id
                        AND  suLatest.latest_timestamp = su.timestamp
                        AND  suLatest.latest_id = coalesce(su.internal_id, supplier_id)
       ) su  ON ( sr.supplier_id = su.supplier_id
             AND  sr.internal_id = su.internal_id 
                )
     OR  (        sr.supplier_id = su.supplier_id
             AND  sr.supplier_id = su.supplier_id 
         ) --
   WHERE  some clause...
   ORDER BY  su.timestamp_at ASC;

也許像這樣的綜合指數INDEX(supplier_id, internal_id)會是有益的。

引用自:https://dba.stackexchange.com/questions/227238