Mysql

兩次更新導致死鎖

  • March 13, 2018

我有一個帶有以下sheme的mysql隊列:

CREATE TABLE IF NOT EXISTS `media` (
 `mid` int(11) NOT NULL AUTO_INCREMENT,
 `order_type` int(11) NOT NULL,
 `media_id` varchar(50) COLLATE utf8_unicode_ci NOT NULL,
 `package` int(11) NOT NULL,
 `sent` int(11) NOT NULL,
 `timestamp` datetime NOT NULL,
 `t_check` int(11) NOT NULL,
 `performed_by` varchar(20) COLLATE utf8_unicode_ci DEFAULT NULL,
 `last_run` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
 `flag` int(11) NOT NULL,
 `package_sent_diff` int(11) AS (package-sent) PERSISTENT,
 PRIMARY KEY (`mid`),
 KEY `last_run` (`last_run`),
 KEY `performed_by` (`performed_by`),
 KEY `sent` (`sent`),
 KEY `package` (`package`),
 KEY `sent_package` (`sent`,`package`),
 KEY `package_sent_diff` (`package_sent_diff`)
) ENGINE=InnoDB  DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci AUTO_INCREMENT=238343961;

package 是工作量, sent 是執行任務的頻率。如果sent < package有工作可供工人接受。package_sent_diff用於索引查詢性能必須完成的數量。performed_by用於將工作分配給工人。

MariaDB 版本 10.0.20,該表有超過 200 萬行。

自動送出已開啟

幾個工作人員基於此隊列執行。

簡單地概述他們的查詢工作:

一名工人執行更新以聲明 10 個未使用的工作

UPDATE media 
SET performed_by = '71602155f18ac6001eb', 
 last_run = NOW() 
WHERE flag = 0 
 AND t_check < 75 
 AND sent < 20 
 AND last_run < NOW() 
 AND performed_by IS NULL 
 AND sent < 4377 
 AND package_sent_diff > 0 
LIMIT 10;

然後工人從表中選擇所有資訊

SELECT * FROM media WHERE performed_by = '71602155f18ac6001eb'

然後它執行作業並將作業的發送值加一併釋放該行以供以後執行

UPDATE media SET sent = sent + 1, performed_by = NULL WHERE mid = 238323961

此時我經常看到package_sent_diffkey發生死鎖

簡化程式碼:

$unique = uniqid();
$mysqli->query("UPDATE media SET performed_by = '".$unique."', last_run = NOW() WHERE flag = 0 AND t_check < 75 AND sent < 20 AND last_run < NOW() AND performed_by IS NULL AND sent < 4377 AND package_sent_diff > 0 LIMIT 10");
$query = $mysqli->query("SELECT * FROM media WHERE performed_by = '".$unique."'");

while($job = $query->fetc_assoc()) {
   if(doJob($job)) {
       $mysqli->query("UPDATE media SET sent = sent + 1, performed_by = NULL WHERE mid = ".$job['mid']);
   }
}

這是結果SHOW ENGINE INNODB STATUS

------------------------
LATEST DETECTED DEADLOCK
------------------------
2015-09-10 14:09:22 7f7f27fb6700
*** (1) TRANSACTION:
TRANSACTION 2544765863, ACTIVE 0 sec updating or deleting
mysql tables in use 1, locked 1
LOCK WAIT 4 lock struct(s), heap size 1184, 3 row lock(s), undo log entries 1
MySQL thread id 725683, OS thread handle 0x7f7f50fb6700, query id 989929718 localhost media_queue updating
UPDATE media SET sent = sent + 1, performed_by = NULL WHERE mid = 238323961
*** (1) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 2565 page no 59448 n bits 768 index `package_sent_diff` of table `media_queue`.`media` trx table locks 1 total table locks 13  trx id 2544765863 lock_mode X locks gap before rec insert intention waiting lock hold time 0 wait time before grant 0 
*** (2) TRANSACTION:
TRANSACTION 2544765389, ACTIVE 1 sec fetching rows, thread declared inside InnoDB 2350
mysql tables in use 1, locked 1
3029 lock struct(s), heap size 357928, 25327 row lock(s)
MySQL thread id 725748, OS thread handle 0x7f7f27fb6700, query id 989929439 localhost media_queue updating
UPDATE media SET performed_by = '71602155f18ac6001eb', last_run = NOW() WHERE flag = 0 AND t_check < 75 AND sent < 20 AND last_run < NOW() AND performed_by IS NULL AND sent < 4377 AND package_sent_diff > 0 LIMIT 10
*** (2) HOLDS THE LOCK(S):
RECORD LOCKS space id 2565 page no 59448 n bits 768 index `package_sent_diff` of table `media_queue`.`media` trx table locks 1 total table locks 13  trx id 2544765389 lock_mode X lock hold time 0 wait time before grant 0 
*** (2) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 2565 page no 55655 n bits 1192 index `package_sent_diff` of table `media_queue`.`media` trx table locks 1 total table locks 13  trx id 2544765389 lock_mode X waiting lock hold time 0 wait time before grant 0 
*** WE ROLL BACK TRANSACTION (1)

要求作業的查詢和更新列的查詢似乎sent相互衝突,從而導致死鎖。

執行的工人越多,我看到死鎖的頻率就越高

我經常讀到一個應用程序必須預料到死鎖,並且應該簡單地重新發出導致死鎖的查詢,但這並沒有給我帶來任何結果。

無論我多久重複一次UPDATE media SET sent = sent + 1, performed_by = NULL WHERE mid = 238323961查詢,3 次或 20 次,如果該查詢已經遇到死鎖,它總是會導致死鎖。

我怎樣才能改變事情以擺脫僵局?保持快速查詢很重要,這樣我就可以在該表上執行 100 多個工作人員,而不會使其速度過慢。

//編輯:

添加複合索引後

ALTER TABLE  `media_queue`.`media` ADD INDEX  `performed_by_package_sent_diff` (  `performed_by` ,  `package_sent_diff` ) COMMENT  '';

並將第二個查詢修改為

UPDATE media SET sent = sent + 3, performed_by = NULL WHERE mid = 243674295 AND performed_by = '92817855f2e21978b76'

我看到的死鎖減少了 10 倍,但死鎖仍然會發生。

這是SHOW ENGINE INNODB STATUS輸出:

------------------------
LATEST DETECTED DEADLOCK
------------------------
2015-09-11 15:48:29 7f7f487ff700
*** (1) TRANSACTION:
TRANSACTION 2585082648, ACTIVE 0 sec updating or deleting
mysql tables in use 1, locked 1
LOCK WAIT 4 lock struct(s), heap size 1184, 3 row lock(s), undo log entries 1
MySQL thread id 748336, OS thread handle 0x7f7f255b7700, query id 1018467234 localhost media_queue updating
UPDATE media SET sent = sent + 1, performed_by = NULL WHERE mid = 244045072 AND performed_by = '11514555f2f24e094e7'
*** (1) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 2587 page no 61522 n bits 784 index `performed_by_package_sent_diff` of table `media_queue`.`media` trx table locks 1 total table locks 41  trx id 2585082648 lock_mode X locks gap before rec insert intention waiting lock hold time 0 wait time before grant 0 
*** (2) TRANSACTION:
TRANSACTION 2585081924, ACTIVE 2 sec fetching rows, thread declared inside InnoDB 1852
mysql tables in use 1, locked 1
4559 lock struct(s), heap size 521768, 66353 row lock(s)
MySQL thread id 748490, OS thread handle 0x7f7f487ff700, query id 1018466800 localhost media_queue Searching rows for update
UPDATE media SET performed_by = '35359855f2f4a5a7eaf', last_run = NOW() WHERE flag = 0 AND t_check < 75 AND sent < 20 AND last_run < NOW() AND performed_by IS NULL AND sent < 4370 AND package_sent_diff > 0 LIMIT 10
*** (2) HOLDS THE LOCK(S):
RECORD LOCKS space id 2587 page no 61522 n bits 784 index `performed_by_package_sent_diff` of table `media_queue`.`media` trx table locks 1 total table locks 41  trx id 2585081924 lock_mode X lock hold time 1 wait time before grant 0 
*** (2) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 2587 page no 73537 n bits 520 index `performed_by_package_sent_diff` of table `media_queue`.`media` trx table locks 1 total table locks 41  trx id 2585081924 lock_mode X waiting lock hold time 0 wait time before grant 0 
*** WE ROLL BACK TRANSACTION (1)

同樣在 90% 的情況下,如果在死鎖發生後重複查詢,我可以強制查詢通過。

這是盡可能優化還是可以完全擺脫死鎖?

添加ORDER BY mid 可能會防止死鎖。

但我真正的答案是“忍受它”。也就是說,認識到您遇到了死鎖並簡單地重新執行UPDATE。你被放慢了一點,但除此之外沒有造成任何傷害。

無關:

KEY `sent` (`sent`),
KEY `sent_package` (`sent`,`package`),

其中第一個是多餘的,可以刪除。

答案應該是修改您選擇的更新,讀取與更新 where 子句相關的主鍵,然後使用其主鍵單獨更新每一行。

我在大約 100 個 mariadb 數據庫的環境中工作,每個數據庫大約 32-40 個 cpu,384gbram,每個節點都有 6tb 到 33tb 的​​ mariadb 數據庫。從字面上看,每秒執行 300,000 到 500,000 筆交易,具體取決於交易量。

許多服務可以在 SUPER HIGH VOLUME 環境中執行的唯一方法是 100% 所有更新和刪除都必須按主鍵,這意味著您只需要單獨的行級鎖。這是當今 mysql 數據庫上可用的最精細的鎖定集。

引用自:https://dba.stackexchange.com/questions/114708