Mysql

儲存過程需要很長時間才能完成

  • December 31, 2017

我們有一個csv有 2000 行的文件和一個table1大約 200 萬行的數據庫,另一個table2有 60000 行的數據庫。

我們需要table1根據 csv 文件中的參數進行查詢,因此中的每一行都csv應該執行一個選擇查詢。最初,我們嘗試for loop在應用程序的 a 中執行以下查詢:

SELECT table1.c1, 
      table1.c2, 
      ST_Distance_Sphere(point(csv.c2[i], csv.c3[i]), 
      point(table1.c3, table1.c4)) * 0.2 AS length, 
      table1.c5 
FROM   table1 
WHERE  table1.c1 IN (SELECT DISTINCT table2.c1 
                    FROM      table2 
                    LEFT JOIN table2.c1=table3.c1 
                    WHERE table2.c2=1 
                    AND   table2.c3 BETWEEN 1000 AND 2000) 
HAVING length < csv.c4[i] 
AND    table1.c5 BETWEEN date("start_date") AND date("end_date") 
ORDER BY table.c1

csv.c1[i] i實際上是循環索引。由於到 MySQL 伺服器的 2000 次往返,因此需要很長時間才能完成。完成查詢大約需要 16 個小時。

所以,我寫了下面的 SP 以避免循環,並簡單地呼叫這個 SP 以便可以在 MySQL 伺服器中進行循環:

CREATE PROCEDURE sp() 
   BEGIN 
     DECLARE arg0 VARCHAR(255);
     DECLARE arg1 FLOAT;
     DECLARE arg2 FLOAT;
     DECLARE arg3 FLOAT;
     DECLARE cur1 CURSOR FOR SELECT * FROM csv_based_table;
     OPEN cur1; 
     read_loop: LOOP
        FETCH cur1 INTO arg0, arg1, arg2, arg3;
        SELECT col0, 
               col1, 
               ST_Distance_Sphere(point(arg1, arg2),point(col3,col4))*2 AS length, 
               arg0 
        FROM   important_table1 
        WHERE  col0 IN (SELECT DISTINCT c0 
                        FROM   important_table2 
                        LEFT JOIN table3 
                        ON col0 = c0 
                        WHERE c1=1 
                        and c2 BETWEEN 1000 AND 2000) 
        HAVING distance < arg3 
        AND col5 BETWEEN date("start_date") AND date("end_date") 
        ORDER BY arg0, col0;
     END LOOP;
     CLOSE cur1;
   END;

因此,我們不使用 csv 文件,而是為該 csv 文件創建一個數據庫表並執行上述儲存過程。問題是這比原來的 2000 次迭代循環慢。在超過 100 個 csv 行的測試中,基於循環的原始解決方案在317.6 秒內完成,而儲存過程僅對 SP 本身就花費了**321.5 秒。**為什麼會發生這種情況,我該如何優化?

這裡有一個建議:

CREATE TEMPORARY TABLE csv
( i int not null
, c1 ... not null
, c2 ... not null
, c3 ... not null
, c4 ... not null )

將 csv 文件載入到臨時表中,並使用連接獲取結果。

SELECT csv.i 
  table1.c1, 
  table1.c2, 
  csv.st_dist_spere, 
  ST_Distance_Sphere(point(csv.c2[i], csv.c3[i]),  
                     point(table1.c3, table1.c4)) * 0.2 AS length, 
  table1.c5 
FROM   table1
CROSS JOIN csv  
WHERE  table1.c1 IN (
                SELECT DISTINCT table2.c1 
                FROM      table2 
                LEFT JOIN table2.c1=table3.c1 
                WHERE table2.c2=1 
                AND   table2.c3 BETWEEN 1000 AND 2000
HAVING length < csv.c4 
  AND    table1.c5 BETWEEN date(start_date) AND date(end_date) 
ORDER BY csv.i, table.c1

然後我會看看IN謂詞(我假設這是它應該是的):

SELECT DISTINCT table2.c1 
FROM      table2 
LEFT JOIN table3
   ON table2.c1=table3.c1 
WHERE table2.c2=1 
 AND table2.c3 BETWEEN 1000 AND 2000 

可以消除針對 table3 的外連接(智能優化器應該意識到這一點,但我不知道這是否在 MySQL 中實現):

SELECT DISTINCT table2.c1 
FROM      table2 
WHERE table2.c2=1 
 AND table2.c3 BETWEEN 1000 AND 2000

這給我們留下了:

SELECT csv.i 
  table1.c1, 
  table1.c2, 
  csv.st_dist_spere, 
  ST_Distance_Sphere(point(csv.c2[i], csv.c3[i]),  
                     point(table1.c3, table1.c4)) * 0.2 AS length, 
  table1.c5 
FROM   table1
CROSS JOIN csv  
WHERE  table1.c1 IN (
                SELECT DISTINCT table2.c1 
                FROM      table2 
                WHERE table2.c2=1 
                AND   table2.c3 BETWEEN 1000 AND 2000
      )
HAVING length < csv.c4 
  AND    table1.c5 BETWEEN date(start_date) AND date(end_date) 
ORDER BY csv.i, table.c1

下一步,嘗試將 替換IN PREDICATEJOIN

SELECT csv.i 
  table1.c1, 
  table1.c2, 
  csv.st_dist_spere, 
  ST_Distance_Sphere(point(csv.c2[i], csv.c3[i]),  
                     point(table1.c3, table1.c4)) * 0.2 AS length, 
  table1.c5 
FROM   table1
CROSS JOIN csv  
JOIN ( SELECT DISTINCT table2.c1 
      FROM      table2 
      WHERE table2.c2=1 
      AND   table2.c3 BETWEEN 1000 AND 2000
    ) as X
  ON X.c1 = table1.c1
HAVING length < csv.c4 
  AND    table1.c5 BETWEEN date(start_date) AND date(end_date) 
ORDER BY csv.i, table.c1 

在進行交叉連接之前將範圍謂詞推入 table1 可能是有利的:

SELECT csv.i 
  table1.c1, 
  table1.c2, 
  csv.st_dist_spere, 
  ST_Distance_Sphere(point(csv.c2[i], csv.c3[i]),  
                     point(table1.c3, table1.c4)) * 0.2 AS length, 
  table1.c5 
FROM (  
   select ... from table1 
   where table1.c5 BETWEEN date(start_date) AND date(end_date)
) as table1
CROSS JOIN csv  
[....]

引用自:https://dba.stackexchange.com/questions/194155