修復 mysql 複製由於底層儲存損壞而崩潰

March 15, 2022

我們有在兩個虛擬機上執行的關鍵數據庫伺服器。一台虛擬機由於底層儲存問題而崩潰，我們不得不將其移動到不同的儲存介質並恢復 xfs 文件系統。

啟動崩潰的 vm 後，我們注意到 mysql 複製已損壞，我們的應用程序無法正常執行。

從 server_01（從）

mysql&gt; SHOW SLAVE STATUS\G
*************************** 1. row ***************************
              Slave_IO_State:
                 Master_Host: x.x.x.10
                 Master_User: slave
                 Master_Port: 3306
               Connect_Retry: 60
             Master_Log_File: mysql-bin.006822
         Read_Master_Log_Pos: 484378856
              Relay_Log_File: mysqld-server-relay-bin.015091
               Relay_Log_Pos: 404689852
       Relay_Master_Log_File: mysql-bin.006822
            Slave_IO_Running: No
           Slave_SQL_Running: Yes
             Replicate_Do_DB:
         Replicate_Ignore_DB: mysql
          Replicate_Do_Table:
      Replicate_Ignore_Table:
     Replicate_Wild_Do_Table:
 Replicate_Wild_Ignore_Table:
                  Last_Errno: 0
                  Last_Error:
                Skip_Counter: 0
         Exec_Master_Log_Pos: 484378856
             Relay_Log_Space: 404690059
             Until_Condition: None
              Until_Log_File:
               Until_Log_Pos: 0
          Master_SSL_Allowed: No
          Master_SSL_CA_File:
          Master_SSL_CA_Path:
             Master_SSL_Cert:
           Master_SSL_Cipher:
              Master_SSL_Key:
       Seconds_Behind_Master: NULL
Master_SSL_Verify_Server_Cert: No
               Last_IO_Errno: 1236
               Last_IO_Error: Got fatal error 1236 from master when reading data from binary log: 'Client requested master to start replication from impossible position; the first event 'mysql-bin.006822' at 484378856, the last event read from '/var/log/mysql/mysql-bin.006822' at 4, the last byte read from '/var/log/mysql/mysql-bin.006822' at 4.'
              Last_SQL_Errno: 0
              Last_SQL_Error:
 Replicate_Ignore_Server_Ids:
            Master_Server_Id: 2
1 row in set (0.00 sec)

從 server_02 (Master-crashed and recovery)

mysql&gt; SHOW SLAVE STATUS\G
*************************** 1. row ***************************
              Slave_IO_State:
                 Master_Host: x.x.x.11
                 Master_User: slave
                 Master_Port: 3306
               Connect_Retry: 60
             Master_Log_File: mysql-bin.011022
         Read_Master_Log_Pos: 480910234
              Relay_Log_File: mysqld-server-relay-bin.003852
               Relay_Log_Pos: 162009
       Relay_Master_Log_File: mysql-bin.011022
            Slave_IO_Running: No
           Slave_SQL_Running: No
             Replicate_Do_DB:
         Replicate_Ignore_DB: mysql
          Replicate_Do_Table:
      Replicate_Ignore_Table:
     Replicate_Wild_Do_Table:
 Replicate_Wild_Ignore_Table:
                  Last_Errno: 0
                  Last_Error:
                Skip_Counter: 0
         Exec_Master_Log_Pos: 480909976
             Relay_Log_Space: 0
             Until_Condition: None
              Until_Log_File:
               Until_Log_Pos: 0
          Master_SSL_Allowed: No
          Master_SSL_CA_File:
          Master_SSL_CA_Path:
             Master_SSL_Cert:
           Master_SSL_Cipher:
              Master_SSL_Key:
       Seconds_Behind_Master: NULL
Master_SSL_Verify_Server_Cert: No
               Last_IO_Errno: 0
               Last_IO_Error:
              Last_SQL_Errno: 0
              Last_SQL_Error:
 Replicate_Ignore_Server_Ids:
            Master_Server_Id: 0
1 row in set (0.00 sec)

在檢查兩個輸出時，我可以看到從站的 Relay_Log_Pos 比主站領先。這是否意味著奴隸比主人擁有新數據？

我正在閱讀有關使用以下選項恢復此內容的資訊。但我不確定這是正確的做法。

STOP SLAVE;
SET GLOBAL SQL_SLAVE_SKIP_COUNTER = 1;

如果從站領先於主站，我可以讓從站作為主數據庫並以其他方式複制嗎？用這種方法我會失去任何數據嗎？

或者以最小的數據失去恢復它的最佳方法是什麼？我們目前沒有任何備份

你沒有提到你有什麼版本的 MySQL，或者你正在使用什麼類型的複制，所以我從輸出中猜測它是 5.7 或更早版本並且你沒有使用 GTID。
根據我的經驗，此錯誤通常發生在主伺服器“意外關閉”之後。
會發生什麼：
$$ normal operation $$
Slave I/O Thread 向 master 請求新數據並開始接收該數據
master 將其二進制 log 輪換到下一個，通知 slave 開始從下一個二進制 log 讀取
從機開始從下一個二進制日誌讀取
$$ CRASH $$
大師崩潰
從機斷開
主人回來了
如果舊日誌損壞，Master 會啟動一個新的 binlog（正常啟動）
從站重新連接並從舊的二進制日誌中請求下一個位置
這個位置不存在，因為日誌現在已經關閉並且主人繼續前進。從機正試圖從“不可能的位置”讀取
複製停止
解決方案：在從數據庫上進行SHOW SLAVE STATUS查詢並記下Relay_Master_Log_File. 這應該是崩潰發生時正在讀取的主伺服器上的二進制日誌。
在Master上查看，應該有一個名字對應的二進制日誌，Date Modified時間就是crash發生的時間。
在從屬問題CHANGE MASTER COMMAND上，將其指向下一個二進制日誌的開頭。
例如，如果它binlog.000001在崩潰發生時正在讀取，則開始讀取binlog.000002 position 1
STOP SLAVE;
CHANGE MASTER TO MASTER_LOG_FILE = 'binlog.000002', MASTER_LOG_POS = 1;
START SLAVE;
複製現在應該繼續沒有錯誤。

引用自：https://dba.stackexchange.com/questions/308738

修復 mysql 複製由於底層儲存損壞而崩潰

相關問答

我應該寫信給 MySQL 從屬（副本）以進行報告嗎？

如何避免複製滯後，以防所有在 master 上的寫入和在副本上的讀取？

在主伺服器上執行“顯示從屬狀態”會顯示什麼？

MySql 複製失敗，出現錯誤 1236 數據位置 > 文件大小，relay-bin 文件不增加

Seconds_Behind_Master 在 0 和數字 x 之間交替

在mysql複製中更改master的主機IP