Mysql
儲存過程需要很長時間才能完成
我們有一個
csv
有 2000 行的文件和一個table1
大約 200 萬行的數據庫,另一個table2
有 60000 行的數據庫。我們需要
table1
根據 csv 文件中的參數進行查詢,因此中的每一行都csv
應該執行一個選擇查詢。最初,我們嘗試for loop
在應用程序的 a 中執行以下查詢:SELECT table1.c1, table1.c2, ST_Distance_Sphere(point(csv.c2[i], csv.c3[i]), point(table1.c3, table1.c4)) * 0.2 AS length, table1.c5 FROM table1 WHERE table1.c1 IN (SELECT DISTINCT table2.c1 FROM table2 LEFT JOIN table2.c1=table3.c1 WHERE table2.c2=1 AND table2.c3 BETWEEN 1000 AND 2000) HAVING length < csv.c4[i] AND table1.c5 BETWEEN date("start_date") AND date("end_date") ORDER BY table.c1
這
csv.c1[i]
i
實際上是循環索引。由於到 MySQL 伺服器的 2000 次往返,因此需要很長時間才能完成。完成查詢大約需要 16 個小時。所以,我寫了下面的 SP 以避免循環,並簡單地呼叫這個 SP 以便可以在 MySQL 伺服器中進行循環:
CREATE PROCEDURE sp() BEGIN DECLARE arg0 VARCHAR(255); DECLARE arg1 FLOAT; DECLARE arg2 FLOAT; DECLARE arg3 FLOAT; DECLARE cur1 CURSOR FOR SELECT * FROM csv_based_table; OPEN cur1; read_loop: LOOP FETCH cur1 INTO arg0, arg1, arg2, arg3; SELECT col0, col1, ST_Distance_Sphere(point(arg1, arg2),point(col3,col4))*2 AS length, arg0 FROM important_table1 WHERE col0 IN (SELECT DISTINCT c0 FROM important_table2 LEFT JOIN table3 ON col0 = c0 WHERE c1=1 and c2 BETWEEN 1000 AND 2000) HAVING distance < arg3 AND col5 BETWEEN date("start_date") AND date("end_date") ORDER BY arg0, col0; END LOOP; CLOSE cur1; END;
因此,我們不使用 csv 文件,而是為該 csv 文件創建一個數據庫表並執行上述儲存過程。問題是這比原來的 2000 次迭代循環慢。在超過 100 個 csv 行的測試中,基於循環的原始解決方案在317.6 秒內完成,而儲存過程僅對 SP 本身就花費了**321.5 秒。**為什麼會發生這種情況,我該如何優化?
這裡有一個建議:
CREATE TEMPORARY TABLE csv ( i int not null , c1 ... not null , c2 ... not null , c3 ... not null , c4 ... not null )
將 csv 文件載入到臨時表中,並使用連接獲取結果。
SELECT csv.i table1.c1, table1.c2, csv.st_dist_spere, ST_Distance_Sphere(point(csv.c2[i], csv.c3[i]), point(table1.c3, table1.c4)) * 0.2 AS length, table1.c5 FROM table1 CROSS JOIN csv WHERE table1.c1 IN ( SELECT DISTINCT table2.c1 FROM table2 LEFT JOIN table2.c1=table3.c1 WHERE table2.c2=1 AND table2.c3 BETWEEN 1000 AND 2000 HAVING length < csv.c4 AND table1.c5 BETWEEN date(start_date) AND date(end_date) ORDER BY csv.i, table.c1
然後我會看看
IN
謂詞(我假設這是它應該是的):SELECT DISTINCT table2.c1 FROM table2 LEFT JOIN table3 ON table2.c1=table3.c1 WHERE table2.c2=1 AND table2.c3 BETWEEN 1000 AND 2000
可以消除針對 table3 的外連接(智能優化器應該意識到這一點,但我不知道這是否在 MySQL 中實現):
SELECT DISTINCT table2.c1 FROM table2 WHERE table2.c2=1 AND table2.c3 BETWEEN 1000 AND 2000
這給我們留下了:
SELECT csv.i table1.c1, table1.c2, csv.st_dist_spere, ST_Distance_Sphere(point(csv.c2[i], csv.c3[i]), point(table1.c3, table1.c4)) * 0.2 AS length, table1.c5 FROM table1 CROSS JOIN csv WHERE table1.c1 IN ( SELECT DISTINCT table2.c1 FROM table2 WHERE table2.c2=1 AND table2.c3 BETWEEN 1000 AND 2000 ) HAVING length < csv.c4 AND table1.c5 BETWEEN date(start_date) AND date(end_date) ORDER BY csv.i, table.c1
下一步,嘗試將 替換
IN PREDICATE
為JOIN
:SELECT csv.i table1.c1, table1.c2, csv.st_dist_spere, ST_Distance_Sphere(point(csv.c2[i], csv.c3[i]), point(table1.c3, table1.c4)) * 0.2 AS length, table1.c5 FROM table1 CROSS JOIN csv JOIN ( SELECT DISTINCT table2.c1 FROM table2 WHERE table2.c2=1 AND table2.c3 BETWEEN 1000 AND 2000 ) as X ON X.c1 = table1.c1 HAVING length < csv.c4 AND table1.c5 BETWEEN date(start_date) AND date(end_date) ORDER BY csv.i, table.c1
在進行交叉連接之前將範圍謂詞推入 table1 可能是有利的:
SELECT csv.i table1.c1, table1.c2, csv.st_dist_spere, ST_Distance_Sphere(point(csv.c2[i], csv.c3[i]), point(table1.c3, table1.c4)) * 0.2 AS length, table1.c5 FROM ( select ... from table1 where table1.c5 BETWEEN date(start_date) AND date(end_date) ) as table1 CROSS JOIN csv [....]