MySQL 停留在“wsrep:為寫入集啟動複製 (-1)”
我最近遇到了 MySQL 的情況,這是我以前從未見過的。我們有一個有 3 個節點的 Percona 集群。主伺服器停止處理查詢,我們託管的 PHP FPM Web 應用程序變得無響應。當我檢查
SHOW PROCESSLIST
時,MySQL 程序處於以下狀態:wsrep: initiating replication for write set (-1)
在 PHP 將所有查詢定向到的主伺服器和兩個輔助伺服器之一上都是這種情況。這是我看到的輸出
SHOW PROCESSLIST
:mysql> show processlist; +-------+-------------+--------------------+-----------------+---------+-------+--------------------------------------------------------+--------------------------------------------------------------------------------------+-----------+---------------+ | Id | User | Host | db | Command | Time | State | Info | Rows_sent | Rows_examined | +-------+-------------+--------------------+-----------------+---------+-------+--------------------------------------------------------+--------------------------------------------------------------------------------------+-----------+---------------+ | 1 | system user | | NULL | Sleep | 2171 | wsrep: committing write set (542480920) | NULL | 0 | 0 | | 2 | system user | | NULL | Sleep | 17169 | wsrep: aborter idle | NULL | 0 | 0 | | 4 | system user | | NULL | Sleep | 3250 | wsrep: deleting row for write-set (542480919) | NULL | 0 | 0 | | 46944 | $user1 | 172.24.62.92:54004 | $user1_db1 | Query | 2158 | wsrep: initiating pre-commit for write set (542481004) | delete from $table where $col < '$val' | 0 | 1 | | 47126 | $user1 | 172.24.62.92:54745 | $user1_db2 | Query | 2096 | wsrep: initiating replication for write set (-1) | update $table2 set $col = current_timestamp where $col2 = 393 and $col3 = 176935 | 0 | 1 | | 47155 | $user1 | 172.24.62.92:54841 | $user1_db3 | Query | 2089 | wsrep: initiating replication for write set (-1) | UPDATE $table SET $col5 = 'something' WHERE $somecol = '$someval' | 0 | 1 | | 47416 | $user1 | 172.24.62.92:55891 | $user1_db3 | Query | 1950 | wsrep: initiating replication for write set (-1) | UPDATE $table SET $col5 = 'something' WHERE $somecol = 's' | 0 | 1 | | 47576 | $user1 | 172.24.62.92:56493 | $user1_db3 | Query | 1849 | wsrep: initiating replication for write set (-1) | INSERT INTO $table3 ($column6, $column7, $column8, $column9, $column10 ... | 0 | 0 | | 47654 | $user1 | 172.24.62.92:56808 | $user1_db2 | Query | 1924 | wsrep: initiating replication for write set (-1) | update $table4 set $col11 = current_timestamp where $col12 = 393 and $col1 = 43625 a | 0 | 1 | | 48036 | $user1 | 172.24.62.92:58343 | $user1_db2 | Query | 1795 | wsrep: initiating replication for write set (-1) | update $table4 set $col11 = current_timestamp where $col12 = 393 and $col1 = 248528 | 0 | 1 | | 48936 | $user1 | 172.24.62.92:61929 | $user1_db2 | Query | 1495 | wsrep: initiating replication for write set (-1) | update $table4 set $col11 = current_timestamp where $col12 = 393 and $col1 = 156001 | 0 | 1 | | 48952 | $user1 | 172.24.62.92:61982 | $user1_db2 | Query | 1490 | wsrep: initiating replication for write set (-1) | update $table4 set $col11 = current_timestamp where $col12 = 393 and $col1 = 205495 | 0 | 1 | | 49497 | $user1 | 172.24.62.92:64167 | $user1_db2 | Query | 1306 | wsrep: initiating replication for write set (-1) | update $table4 set $col11 = current_timestamp where $col12 = 393 and $col1 = 234457 | 0 | 1 | | 49510 | $user1 | 172.24.62.92:64218 | $user1_db2 | Query | 1302 | wsrep: initiating replication for write set (-1) | update $table4 set $col11 = current_timestamp where $col12 = 393 and $col1 = 209489 | 0 | 1 | | 49839 | $user1 | 172.24.62.92:65534 | $user1_db2 | Query | 1192 | wsrep: initiating replication for write set (-1) | update $table4 set $col11 = current_timestamp where $col12 = 393 and $col1 = 70958 a | 0 | 1 | | 49970 | $user1 | 172.24.62.92:1539 | $user1_db2 | Query | 1096 | wsrep: initiating replication for write set (-1) | update $table set $col11 = $col11 + 1 where id = $val | 0 | 1 | | 50292 | $user1 | 172.24.62.92:2819 | $user1_db2 | Query | 1041 | wsrep: initiating replication for write set (-1) | update $table4 set $col11 = current_timestamp where $col12 = 393 and $col1 = 193078 | 0 | 1 | | 50398 | $user1 | 172.24.62.92:3240 | $user1_db2 | Query | 1006 | wsrep: initiating replication for write set (-1) | update $table4 set $col11 = current_timestamp where $col12 = 393 and $col1 = 242842 | 0 | 1 | | 51120 | $user1 | 172.24.62.92:6135 | $user1_db2 | Query | 763 | wsrep: initiating replication for write set (-1) | update $table4 set $col11 = current_timestamp where $col12 = 393 and $col1 = 173382 | 0 | 1 | | 51453 | $user1 | 172.24.62.92:7456 | $user1_db1 | Query | 653 | wsrep: initiating replication for write set (-1) | delete from $table5 where expiry < 1496379436 | 0 | 2 | | 51460 | $user1 | 172.24.62.92:7475 | $user1_db1 | Query | 651 | wsrep: initiating replication for write set (-1) | insert into $table5 values ('... | 0 | 0 | | 51504 | $user1 | 172.24.62.92:7646 | $user1_db1 | Query | 587 | wsrep: initiating replication for write set (-1) | insert into $table5 values ('... | 0 | 0 | | 51525 | $user1 | 172.24.62.92:7721 | $user1_db1 | Query | 631 | wsrep: initiating replication for write set (-1) | insert into $table5 values ('... | 0 | 0 | | 51998 | $user1 | 172.24.62.92:9585 | $user1_db2 | Query | 475 | wsrep: initiating replication for write set (-1) | update $table4 set $col11 = current_timestamp where $col12 = 393 and $col1 = 203223 | 0 | 1 | | 52290 | $user1 | 172.24.62.92:10759 | $user1_db2 | Query | 377 | wsrep: initiating replication for write set (-1) | update $table4 set $col11 = current_timestamp where $col12 = 393 and $col1 = 185874 | 0 | 1 | | 53055 | $user1 | 172.24.62.92:13797 | $user1_db2 | Query | 123 | wsrep: initiating replication for write set (-1) | update $table4 set $col11 = current_timestamp where $col12 = 393 and $col1 = 89879 a | 0 | 1 | | 53303 | $user1 | 172.24.62.92:14793 | $user1_db2 | Query | 39 | wsrep: initiating replication for write set (-1) | update $table4 set $col11 = current_timestamp where $col12 = 393 and $col1 = 146551 | 0 | 1 | | 53396 | $user1 | 172.24.62.92:15176 | $user1_db2 | Query | 7 | updating | update $table4 set $col11 = current_timestamp where $col12 = 393 and $col1 = 146551 | 0 | 0 | | 53403 | $user1 | 172.24.62.92:15205 | $user1_db2 | Query | 5 | updating | update $table4 set $col11 = current_timestamp where $col12 = 393 and $col1 = 146551 | 0 | 0 | | 53410 | root | localhost | NULL | Query | 0 | starting | show processlist | 0 | 0 | +-------+-------------+--------------------+-----------------+---------+-------+--------------------------------------------------------+--------------------------------------------------------------------------------------+-----------+---------------+ 30 rows in set (0.00 sec)
(更改表和列名稱以保護無辜)
我無法弄清楚如何從這種情況中恢復,我們最終不得不重新啟動整個集群(滾動重啟)並使用重新啟動集群
service mysql bootstrap-pxc
我們剛剛升級到 Percona 5.7…我們還更改了MySQL Docs
SET GLOBAL tx_isolation='READ-COMMITTED';
引用的Set Transactions ,以嘗試提高 SELECT 的性能。我們不確定這些是否可能導致我們遇到的情況。如果 MySQL 陷入狀態
wsrep: initiating replication for write set (-1)
,這意味著什麼?有哪些可能的原因(和修復)?
似乎像這樣的總崩潰可能是由於一些錯誤,可能在 5.7.17 中引入並可能在 5.7.20 中修復,參考https://jira.percona.com/browse/PXC-877
我注意到我們的系統中有很多查詢掛起相同的狀態消息,但問題在一兩分鐘內自行解決,因此可能不是同一個錯誤。如果另一個主人正在做一些大的寫操作,這似乎發生在一個(或多個)主人身上。我確實曾經關閉整個集群以嘗試清除積壓 - 這是一個錯誤,導致相當長的停機時間,更正確的解決方法是將 mysql 流量重定向到未阻塞的一台伺服器並等待其他人自己冷靜下來。
$ mysqld –version mysqld Ver 5.7.23-23-57 for Linux on x86_64 (Percona XtraDB Cluster (GPL), Release rel23, Revision f5578f0, WSREP version 31.31, wsrep_31.31)
(這不能回答所提出的問題,但它可能會消除問題。)
聽起來您缺少複合材料
INDEX($col1, $col2)
(按任意順序)。而且桌子比較大。
在這兩者之間,每個
UPDATE
似乎都需要一兩分鐘,顯然阻止了下一個UPDATE
。