為什麼在嘗試建構數據庫時查詢會在幾分鐘後變慢？

July 2, 2017

首先 - 我是 DBA 的新手。請溫柔一點。（而且我對 python 也很陌生。）哦，我不知道如何更改任何 MySQL 設置等，所以所有 MySQL DB 設置都只是預設值。我在具有 8G RAM 的 AMD64 上執行 Ubuntu 14.04。
我有一個巨大的 excel (xlsx) 電子表格（大約 20K 行和 100 列），它來自一個舊數據庫（我無權訪問），我正試圖用它來建構一個新的 MySQL 數據庫。我正在使用 python，以及帶有 mysqldb 的 openpyxl。我寫了以下內容：
（還有更多內容，但我相信您可能需要的一切都在這裡。程式碼執行良好，只是經過多次迭代後速度慢得難以忍受。）：
theSList = []
theBigList = []
for col in range(1, ws.get_highest_column() + 1):
   myStr1 = str(ws.cell(column = col, row=2).value)
   myStr1 = myStr1.translate(string.maketrans("",""),string.punctuation)
   myStr1 = ''.join(myStr1.split())            
       theBigList.append(myStr1)
   theSList.append('%s')

theDeetsString = 'INSERT INTO hf_fund_details(' + ", ".join([theBigList[i - 1] for i in deetsCols]) + ')' + ' VALUES (' + ", ".join([theSList[i - 1] for i in deetsCols]) + ')'
theMgrDeetsString = 'INSERT INTO hf_mgr_details (' + ", ".join([theBigList[i - 1] for i in mgrCols]) + ')' + ' VALUES (' + ", ".join([theSList[i - 1] for i in mgrCols]) + ')'
theFundStratString = 'INSERT INTO hf_strat_details (' + ", ".join([theBigList[i - 1] for i in stratCols]) + ')' + ' VALUES (' + ", ".join([theSList[i - 1] for i in stratCols]) + ')'
theFundIDsString = 'INSERT INTO hf_id_details (' + ", ".join([theBigList[i - 1] for i in idCols]) + ')' + ' VALUES (' + ", ".join([theSList[i - 1] for i in idCols]) + ')'
theFundFeeString = 'INSERT INTO hf_fee_details (' + ", ".join([theBigList[i - 1] for i in feeCols]) + ')' + ' VALUES (' + ", ".join([theSList[i - 1] for i in feeCols]) + ')'
theFundSPsString = 'INSERT INTO hf_servpro_details (' + ", ".join([theBigList[i - 1] for i in servproCols]) + ')' + ' VALUES (' + ", ".join([theSList[i - 1] for i in servproCols]) + ')'

con = mdb.connect('localhost', 'root', 'K2Kill3rs', 'igp_hf_db');

with con:
   cur = con.cursor()

   cur.execute("DROP TABLE IF EXISTS hf_mgr_details")
   cur.execute("DROP TABLE IF EXISTS hf_fund_details")
   cur.execute("DROP TABLE IF EXISTS hf_strat_details")
   cur.execute("DROP TABLE IF EXISTS hf_id_details")
   cur.execute("DROP TABLE IF EXISTS hf_fee_details")
   cur.execute("DROP TABLE IF EXISTS hf_servpro_details")

   cur.execute("CREATE TABLE hf_fund_details(" + theDeetsStr + ")")
   cur.execute("CREATE TABLE hf_mgr_details(" + theMgrDeetsStr + ")")
   cur.execute("CREATE TABLE hf_strat_details(" + theFundStratStr + ")")
   cur.execute("CREATE TABLE hf_id_details(" + theFundIDsStr + ")")
   cur.execute("CREATE TABLE hf_fee_details(" + theFundFeeStr + ")")
   cur.execute("CREATE TABLE hf_servpro_details(" + theFundSPsStr + ")")

for row in range(4,int(ws.get_highest_row())):
   if ws.cell(column = 1, row=row).value is None:
       continue
   theValueList = []
   for col in range(1, ws.get_highest_column() + 1):
       theCellValue = ws.cell(column = col, row=row).value     
       if theCellValue is not None:
           if theCellValue == 'Yes':   
               theValueList.append(str(1))
           elif theCellValue == 'No':  
               theValueList.append(str(0))
           elif type(theCellValue) is UnicodeType: 
               theValueList.append(theCellValue.encode('ascii', 'ignore'))
           else:
               theValueList.append(theCellValue)
       else:
           theValueList.append(0)

   with con:
       cur = con.cursor()

       cur.execute(theDeetsString, tuple([theValueList[i - 1] for i in deetsCols]))
       cur.execute(theMgrDeetsString, tuple([theValueList[i - 1] for i in mgrCols]))
       cur.execute(theFundStratString, tuple([theValueList[i - 1] for i in stratCols]))
       cur.execute(theFundIDsString, tuple([theValueList[i - 1] for i in idCols]))
       cur.execute(theFundFeeString, tuple([theValueList[i - 1] for i in feeCols]))
       cur.execute(theFundSPsString, tuple([theValueList[i - 1] for i in servproCols]))

   #ver = cur.fetchone()
   #print "Database version : %s " % ver
   print 'Completed Row #' + str(row) + ' at ' + datetime.datetime.strftime(datetime.datetime.now(), '%Y-%m-%d %H:%M:%S')


if con:    
   con.close()
所以這會刪除舊表（我剛剛在上次嘗試中創建的表），創建新表，然後循環遍歷 excel 文件的每一行，創建 6 個查詢並將數據放入 6 個不同的表中。這開始非常快，在一秒鐘內對每一行執行所有 6 個查詢。但是，在幾分鐘之內，它會大大減慢速度，並且在 30 分鐘內，循環的每次迭代（excel 中的 1 行，6 個查詢）都需要大約 5-10 分鐘才能執行。
玩過之後，我很肯定問題不在於 openpyxl，而是查詢到 MySQL。它不是任何特定的表或查詢，每個查詢只需幾秒鐘即可完成。這裡有什麼明顯的錯誤嗎？
據我了解，MySQL 在後台進行了大量維護，因為這一切都在進行，這可能是導致一切變慢的原因。我認為它正在嘗試優化表，而我仍在嘗試建構它們。有沒有辦法將 MySQL 設置為“建構模式”，這樣它就不會嘗試這樣做？有沒有辦法只用所有這些行建構一個大規模查詢？或者一次執行幾百行的查詢？（excel文件有大約20K行我需要通過的數據。）
同樣，這些是以前從未與 DBs 合作過的人的想法，但如果我很愚蠢，請隨時告訴我。

您遇到的緩慢可能是由於多種原因，一個或多個。
需要注意的是，20K 行是相對較少的行數。
您可能需要檢查以下內容：
表、引擎、鍵（索引）和主鍵的結構是什麼。
在一定次數的迭代後輸出查詢，以確保您的查詢不會累積變得巨大。
慢查詢日誌可能有一些提示。
根據您“可能”找到的原因，將提出解決方案，例如：
禁用（或刪除）密鑰並在導入完成後重新啟用它們。
檢查您是否有每個表的正確主鍵，特別是如果您使用的引擎是 InnoDB
使用批量插入。
採用load data
高溫高壓

除了@jehad 的答案，執行這些（或將它們添加到 MYSQL 配置文件並重新啟動數據庫）。
set global innodb_flush_log_at_trx_commit=2
set global general_log=0
這應該使插入更快。

引用自：https://dba.stackexchange.com/questions/106189

為什麼在嘗試建構數據庫時查詢會在幾分鐘後變慢？

相關問答

從 Excel 文件中提取數據

MySQL 使用所有 RAM

刪除mariadb中的數百萬條記錄後如何釋放硬碟空間

4 行到 2 兩個具有相似列屬性

覆蓋 MySQL 數據庫只儲存 1 個月的數據

我的磁碟充滿了 binlog 文件