Mysql
在 ON 子句中使用 OR 運算符加快 LEFT JOIN
我有一個查詢,它在 ON 子句中使用帶有 OR 運算符的 LEFT JOIN。註釋掉 OR 運算符時,查詢執行時間約為 150 毫秒。查詢中保留的 OR 運算符需要 80 多秒才能執行。有誰知道如何加快速度?詳情如下。
supplier_responses +----+-------------+-------------+----------+-------+ | id | internal_id | supplier_id | supplier | price | +----+-------------+-------------+----------+-------+ | 1 | 100 | 100 | poggle | 10 | | 2 | 101 | 101 | poggle | 15 | | 3 | 102 | 290 | lello | 12 | | 4 | 103 | 370 | chugs | 10 | | .. | ... | ... | ... | ... | +----+-------------+-------------+----------+-------+ Indexes on internal_id, supplier_id, supplier. supplier_updates +----+-------------+-------------+----------+--------+--------------+ | id | internal_id | supplier_id | supplier | status | timestamp | +----+-------------+-------------+----------+--------+--------------+ | 1 | 100 | 100 | poggle | 80 | 2019-01-15...| | 2 | 100 | 100 | poggle | 100 | 2019-01-16...| | 3 | null | 290 | lello | 80 | 2019-01-16...| | 4 | 107 | 107 | poggle | 80 | 2019-01-17...| | 5 | 112 | null | chugs | 100 | 2019-01-17...| | 6 | null | 100 | lello | 100 | 2019-01-18...| | .. | ... | ... | ... | ... | ... | +----+-------------+-------------+----------+--------+--------------+ Indexes on internal_id, supplier_id, supplier, timestamp.
對於“poggle” internal_id 和 supplier_id 將是相同的。
SELECT * FROM supplier_responses sr LEFT JOIN (SELECT * FROM supplier_updates su LEFT JOIN (SELECT supplier supplier_name, COALESCE(internal_id, supplier_id) latest_id, Max(timestamp) latest_timestamp FROM supplier_updates GROUP BY supplier_name, supplier_id) suLatest ON suLatest.supplier_name = su.supplier AND suLatest.latest_timestamp = su.timestamp AND suLatest.latest_id = COALESCE(su.internal_id, supplier_id) ) su ON ( sr.supplier = su.supplier AND sr.internal_id = su.internal_id ) OR ( sr.supplier = su.supplier AND sr.supplier_id = su.supplier_id ) ORDER BY su.timestamp_at ASC;
像這樣的
OR
連接謂詞中的一個可以強制掃描,因為與函式的許多用途一樣,結果不是sargable。假設有適當的索引,您通常可以使用 a
UNION
來避免掃描,因此:SELECT * FROM supplier_responses sr LEFT JOIN supplier_updates su ON ( sr.supplier = su.supplier AND sr.internal_id = su.internal_id ) OR ( sr.supplier = su.supplier AND sr.supplier_id = su.supplier_id ) ORDER BY su.timestamp_at ASC;
變成:
SELECT * FROM supplier_responses sr LEFT JOIN supplier_updates su ON ( sr.supplier = su.supplier AND sr.internal_id = su.internal_id ) UNION SELECT * FROM supplier_responses sr LEFT JOIN supplier_updates su ON ( sr.supplier = su.supplier AND sr.supplier_id = su.supplier_id ) ORDER BY timestamp_at ASC;
這看起來需要更多的工作,但如果每個或選擇更快,因為它們被正確索引,
OR
否則會強制掃描索引或整個表,那麼數據庫引擎處理的工作將大大減少。
SELECT
如果您知道兩個輸出之間不會有重複(兩者sr.supplier = su.supplier AND sr.internal_id = su.internal_id
都sr.supplier = su.supplier AND sr.supplier_id = su.supplier_id
匹配的行),您可以進一步加快速度,UNION ALL
而不是使用UNION
它來節省額外的排序操作(這可能最終會假離線到磁碟)。為簡潔起見,我稍微簡化了您的範例。您對派生表的子選擇
su
可能需要分解為 CTE 或視圖以避免重複該程式碼(這可能會在以後鼓勵部分編輯錯誤)。當然,子查詢本身及其子子查詢可能會導致性能問題,具體取決於 mySQL 查詢計劃器的亮度,特別是現在它被命中兩次,但我一直堅持專門處理這個JOIN ... ON ... OR ...
問題。
- 除非你真的需要,否則更改
LEFT JOIN
為.JOIN``LEFT
- 清理諸如
( sr.supplier_id = su.supplier_id AND sr.supplier_id = su.supplier_id )
- 改成
OR
aUNION
(有很多重寫)- 提供
SHOW CREATE TABLE
(以便我們查看您是否有合適的索引)- 不要忽略其他條件——它們可能是問題的一部分。
- 改進漂亮的列印,也許
像這樣:
SELECT * FROM supplier_responses sr LEFT JOIN ( SELECT * FROM supplier_updates su LEFT JOIN ( SELECT supplier_id supplierId, coalesce(internal_id, supplier_id) latest_id, Max(timestamp) latest_timestamp FROM supplier_updates GROUP BY supplier_id, supplier_id ) suLatest ON suLatest.supplierId = su.supplier_id AND suLatest.latest_timestamp = su.timestamp AND suLatest.latest_id = coalesce(su.internal_id, supplier_id) ) su ON ( sr.supplier_id = su.supplier_id AND sr.internal_id = su.internal_id ) OR ( sr.supplier_id = su.supplier_id AND sr.supplier_id = su.supplier_id ) -- WHERE some clause... ORDER BY su.timestamp_at ASC;
也許像這樣的綜合指數
INDEX(supplier_id, internal_id)
會是有益的。