Replication

MongoDB Replica Set Failover 由於不可達 SECONDARY 導致 Rollback

  • May 15, 2018

我有一個 3 成員副本集,其中 1 個主副本、1 個輔助副本和 1 個仲裁器。由於一些明顯的網路問題,輔助伺服器無法訪問。一段時間後,輔助節點變成了主節點,導致我原來的主節點經歷了回滾(失去了一些寫操作)。

以下是來自每個 mongod 實例的日誌:

主要(XXX.XX.XX.88:27017):

2018-05-11T06:04:30.441+0000 I REPL     [ReplicationExecutor] Error in heartbeat request to XXX.XX.XX.227:27017; ExceededTimeLimit: Operation timed out
2018-05-11T06:04:42.439+0000 I REPL     [ReplicationExecutor] Error in heartbeat request to XXX.XX.XX.227:27017; ExceededTimeLimit: Couldn't get a connection within the time limit
2018-05-11T06:04:52.439+0000 I REPL     [ReplicationExecutor] Error in heartbeat request to XXX.XX.XX.227:27017; ExceededTimeLimit: Operation timed out

//********* THE SAME "Error in heartbeat" REPEATS FOR over 20 mins

2018-05-11T06:25:12.514+0000 I REPL     [ReplicationExecutor] Error in heartbeat request to XXX.XX.XX.227:27017; ExceededTimeLimit: Operation timed out
2018-05-11T06:25:18.515+0000 I REPL     [ReplicationExecutor] Error in heartbeat request to XXX.XX.XX.227:27017; ExceededTimeLimit: Couldn't get a connection within the time limit
2018-05-11T06:25:30.516+0000 I REPL     [ReplicationExecutor] Error in heartbeat request to XXX.XX.XX.227:27017; ExceededTimeLimit: Couldn't get a connection within the time limit
2018-05-11T06:25:42.223+0000 I NETWORK  [initandlisten] connection accepted from XXX.XX.XX.227:59313 #220 (7 connections now open)
2018-05-11T06:25:42.224+0000 I ASIO     [NetworkInterfaceASIO-Replication-0] Successfully connected to XXX.XX.XX.227:27017
2018-05-11T06:25:42.224+0000 I NETWORK  [initandlisten] connection accepted from XXX.XX.XX.227:59315 #221 (8 connections now open)
2018-05-11T06:25:42.229+0000 I REPL     [ReplicationExecutor] Member XXX.XX.XX.227:27017 is now in state SECONDARY
2018-05-11T06:25:42.330+0000 I REPL     [ReplicationExecutor] stepping down from primary, because a new term has begun: 3
2018-05-11T06:25:42.334+0000 I REPL     [replExecDBWorker-2] transition to SECONDARY

次要(XXX.XX.XX.227:27017):

2018-05-11T06:25:42.200+0000 W NETWORK  [HostnameCanonicalizationWorker] Failed to obtain address information for hostname ip-XXX-XX-XX-227: Name or service not known
2018-05-11T06:25:42.200+0000 I NETWORK  [SyncSourceFeedback] Socket recv() timeout  XXX.XX.XX.88:27017
2018-05-11T06:25:42.202+0000 I NETWORK  [initandlisten] connection accepted from XXX.XX.XX.145:50720 #449 (4 connections now open)
2018-05-11T06:25:42.203+0000 I REPL     [ReplicationExecutor] Starting an election, since we've seen no PRIMARY in the past 10000ms
2018-05-11T06:25:42.205+0000 I REPL     [ReplicationExecutor] conducting a dry run election to see if we could be elected
2018-05-11T06:25:42.209+0000 I NETWORK  [initandlisten] connection accepted from XXX.XX.XX.88:60370 #450 (5 connections now open)
2018-05-11T06:25:42.211+0000 I REPL     [ReplicationExecutor] Error in heartbeat request to XXX.XX.XX.145:27017; ExceededTimeLimit: Operation timed out
2018-05-11T06:25:42.212+0000 I REPL     [ReplicationExecutor] Error in heartbeat request to XXX.XX.XX.88:27017; HostUnreachable: Connection timed out
2018-05-11T06:25:42.213+0000 I REPL     [ReplicationExecutor] could not find member to sync from
2018-05-11T06:25:42.230+0000 I ASIO     [NetworkInterfaceASIO-Replication-0] Successfully connected to XXX.XX.XX.88:27017
2018-05-11T06:25:42.232+0000 I NETWORK  [conn420] end connection XXX.XX.XX.88:59979 (4 connections now open)
2018-05-11T06:25:42.232+0000 I NETWORK  [SyncSourceFeedback] SocketException: remote: (NONE):0 error: 9001 socket exception [RECV_TIMEOUT] server [XXX.XX.XX.88:27017] 
2018-05-11T06:25:42.232+0000 I ASIO     [NetworkInterfaceASIO-Replication-0] Successfully connected to XXX.XX.XX.88:27017
2018-05-11T06:25:42.232+0000 I NETWORK  [conn448] end connection XXX.XX.XX.145:33580 (3 connections now open)
2018-05-11T06:25:42.233+0000 I NETWORK  [conn424] end connection XXX.XX.XX.145:59144 (2 connections now open)
2018-05-11T06:25:42.233+0000 I REPL     [ReplicationExecutor] Member XXX.XX.XX.88:27017 is now in state PRIMARY
2018-05-11T06:25:42.233+0000 I ASIO     [NetworkInterfaceASIO-Replication-0] Successfully connected to XXX.XX.XX.145:27017
2018-05-11T06:25:42.233+0000 I ASIO     [NetworkInterfaceASIO-Replication-0] Successfully connected to XXX.XX.XX.145:27017
2018-05-11T06:25:42.245+0000 I REPL     [ReplicationExecutor] Member XXX.XX.XX.145:27017 is now in state ARBITER
2018-05-11T06:25:42.246+0000 I REPL     [ReplicationExecutor] VoteRequester: Got no vote from XXX.XX.XX.88:27017 because: candidate's data is staler than mine, resp:{ term: 2, voteGranted: false, reason: "candidate's data is staler than mine", ok: 1.0 }
2018-05-11T06:25:42.254+0000 I REPL     [SyncSourceFeedback] SyncSourceFeedback error sending update: network error while attempting to run command 'replSetUpdatePosition' on host 'XXX.XX.XX.88:27017' 
2018-05-11T06:25:42.256+0000 I REPL     [SyncSourceFeedback] updateUpstream failed: HostUnreachable: network error while attempting to run command 'replSetUpdatePosition' on host 'XXX.XX.XX.88:27017' , will retry
2018-05-11T06:25:42.298+0000 I REPL     [ReplicationExecutor] dry election run succeeded, running for election
2018-05-11T06:25:42.332+0000 I REPL     [ReplicationExecutor] election succeeded, assuming primary role in term 3
2018-05-11T06:25:42.360+0000 I REPL     [ReplicationExecutor] transition to PRIMARY
2018-05-11T06:25:42.360+0000 I ASIO     [ReplicationExecutor] dropping unhealthy pooled connection to XXX.XX.XX.88:27017
2018-05-11T06:25:42.360+0000 I ASIO     [ReplicationExecutor] after drop, pool was empty, going to spawn some connections
2018-05-11T06:25:42.361+0000 I ASIO     [NetworkInterfaceASIO-Replication-0] Successfully connected to XXX.XX.XX.88:27017
2018-05-11T06:25:42.379+0000 I REPL     [ReplicationExecutor] Member XXX.XX.XX.88:27017 is now in state SECONDARY
2018-05-11T06:25:43.241+0000 I REPL     [rsSync] transition to primary complete; database writes are now permitted

仲裁者(XXX.XX.XX.145:27017):

2018-05-11T06:04:33.430+0000 I REPL     [ReplicationExecutor] Error in heartbeat request to XXX.XX.XX.227:27017; ExceededTimeLimit: Operation timed out
2018-05-11T06:04:48.428+0000 I REPL     [ReplicationExecutor] Error in heartbeat request to XXX.XX.XX.227:27017; ExceededTimeLimit: Couldn't get a connection within the time limit

//********* THE SAME "Error in heartbeat" REPEATS FOR over 20 mins

2018-05-11T06:25:28.458+0000 I REPL     [ReplicationExecutor] Error in heartbeat request to XXX.XX.XX.227:27017; ExceededTimeLimit: Operation timed out
2018-05-11T06:25:33.458+0000 I REPL     [ReplicationExecutor] Error in heartbeat request to XXX.XX.XX.227:27017; ExceededTimeLimit: Couldn't get a connection within the time limit
2018-05-11T06:25:42.218+0000 I NETWORK  [initandlisten] connection accepted from XXX.XX.XX.227:52658 #35 (3 connections now open)
2018-05-11T06:25:42.219+0000 I NETWORK  [initandlisten] connection accepted from XXX.XX.XX.227:52660 #36 (4 connections now open)
2018-05-11T06:25:43.486+0000 I ASIO     [NetworkInterfaceASIO-Replication-0] Successfully connected to XXX.XX.XX.227:27017
2018-05-11T06:25:43.487+0000 I REPL     [ReplicationExecutor] Member XXX.XX.XX.227:27017 is now in state PRIMARY
2018-05-11T06:25:47.203+0000 I ASIO     [ReplicationExecutor] dropping unhealthy pooled connection to XXX.XX.XX.88:27017
2018-05-11T06:25:47.203+0000 I ASIO     [ReplicationExecutor] after drop, pool was empty, going to spawn some connections
2018-05-11T06:25:47.205+0000 I ASIO     [NetworkInterfaceASIO-Replication-0] Successfully connected to XXX.XX.XX.88:27017
2018-05-11T06:25:47.205+0000 I REPL     [ReplicationExecutor] Member XXX.XX.XX.88:27017 is now in state SECONDARY

故障轉移後的新副本集配置:

{
   "_id" : "rs0",
   "version" : 13,
   "protocolVersion" : NumberLong(1),
   "members" : [
       {
           "_id" : 1,
           "host" : "XXX.XX.XX.227:27017",
           "arbiterOnly" : false,
           "buildIndexes" : true,
           "hidden" : false,
           "priority" : 1,
           "tags" : {

           },
           "slaveDelay" : NumberLong(0),
           "votes" : 1
       },
       {
           "_id" : 2,
           "host" : "XXX.XX.XX.88:27017",
           "arbiterOnly" : false,
           "buildIndexes" : true,
           "hidden" : false,
           "priority" : 1,
           "tags" : {

           },
           "slaveDelay" : NumberLong(0),
           "votes" : 1
       },
       {
           "_id" : 3,
           "host" : "XXX.XX.XX.145:27017",
           "arbiterOnly" : true,
           "buildIndexes" : true,
           "hidden" : false,
           "priority" : 1,
           "tags" : {

           },
           "slaveDelay" : NumberLong(0),
           "votes" : 1
       }
   ],
   "settings" : {
       "chainingAllowed" : true,
       "heartbeatIntervalMillis" : 2000,
       "heartbeatTimeoutSecs" : 10,
       "electionTimeoutMillis" : 10000,
       "getLastErrorModes" : {

       },
       "getLastErrorDefaults" : {
           "w" : 1,
           "wtimeout" : 0
       },
       "replicaSetId" : ObjectId("5ae2f18614d8c4dc009d875e")
   }
}

觀察:

  1. 輔助 (227) 由於某種原因無法訪問,如 Primary(88) 和 Arbiter(145) 的日誌中所見 - “對 XXX.XX.XX.227:27017 的心跳請求錯誤”
  2. 無法訪問的原因似乎是輔助伺服器(227)本身的日誌中看到的網路問題 - “無法獲取主機名 ip-XXX-XX-XX-227 的地址資訊:名稱或服務未知”

問題:

  1. 什麼可能導致輔助節點 (227) 變得無法訪問?
  2. 為什麼無法訪問的輔助實例會導致可訪問的主實例回滾?
  3. 副本集是否配置錯誤導致故障轉移期間回滾
  4. 以後有沒有辦法避免這種情況?(除了 mongodb 文件中的建議 - https://docs.mongodb.com/manual/core/replica-set-rollbacks/

編輯:在輔助 (227) 上附加系統日誌

May 11 06:01:38 ip-XXX-XX-XX-227 dhclient: bound to XXX.XX.XX.227 -- renewal in 1400 seconds.
May 11 06:24:58 ip-XXX-XX-XX-227 dhclient: DHCPREQUEST of XXX.XX.XX.227 on eth0 to XXX.XX.XX.1 port 67 (xid=0x777d16dc)
May 11 06:24:58 ip-XXX-XX-XX-227 dhclient: DHCPACK of XXX.XX.XX.227 from XXX.XX.XX.1
May 11 06:25:42 ip-XXX-XX-XX-227 dhclient: bound to XXX.XX.XX.227 -- renewal in 1660 seconds.
May 11 06:25:42 ip-XXX-XX-XX-227 kernel: [1279440.796093] INFO: task jbd2/xvda1-8:206 blocked for more than 120 seconds.
May 11 06:25:42 ip-XXX-XX-XX-227 kernel: [1279440.800781]       Not tainted 3.13.0-91-generic #138-Ubuntu
May 11 06:25:42 ip-XXX-XX-XX-227 kernel: [1279440.804393] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
May 11 06:25:42 ip-XXX-XX-XX-227 kernel: [1279440.809237] jbd2/xvda1-8    D ffff88003fc13180     0   206      2 0x00000000
May 11 06:25:42 ip-XXX-XX-XX-227 kernel: [1279440.809241]  ffff880037689cb0 0000000000000046 ffff88003780c800 0000000000013180
May 11 06:25:42 ip-XXX-XX-XX-227 kernel: [1279440.809243]  ffff880037689fd8 0000000000013180 ffff88003780c800 ffff880037689d98
May 11 06:25:42 ip-XXX-XX-XX-227 kernel: [1279440.809244]  ffff8800377128b8 ffff88003780c800 ffff88001cade100 ffff880037689d80
May 11 06:25:42 ip-XXX-XX-XX-227 kernel: [1279440.809246] Call Trace:
May 11 06:25:42 ip-XXX-XX-XX-227 kernel: [1279440.809253]  [<ffffffff8172dd89>] schedule+0x29/0x70
May 11 06:25:42 ip-XXX-XX-XX-227 kernel: [1279440.809258]  [<ffffffff8128ef27>] jbd2_journal_commit_transaction+0x287/0x1b90
May 11 06:25:42 ip-XXX-XX-XX-227 kernel: [1279440.809261]  [<ffffffff8109fe28>] ? sched_clock_cpu+0xa8/0x100
May 11 06:25:42 ip-XXX-XX-XX-227 kernel: [1279440.809263]  [<ffffffff810adad0>] ? prepare_to_wait_event+0x100/0x100
May 11 06:25:42 ip-XXX-XX-XX-227 kernel: [1279440.809266]  [<ffffffff81076b7b>] ? lock_timer_base.isra.35+0x2b/0x50
May 11 06:25:42 ip-XXX-XX-XX-227 kernel: [1279440.809268]  [<ffffffff81077a4f>] ? try_to_del_timer_sync+0x4f/0x70
May 11 06:25:42 ip-XXX-XX-XX-227 kernel: [1279440.809270]  [<ffffffff81294a7d>] kjournald2+0xbd/0x240
May 11 06:25:42 ip-XXX-XX-XX-227 kernel: [1279440.809272]  [<ffffffff810adad0>] ? prepare_to_wait_event+0x100/0x100
May 11 06:25:42 ip-XXX-XX-XX-227 kernel: [1279440.809273]  [<ffffffff812949c0>] ? commit_timeout+0x10/0x10
May 11 06:25:42 ip-XXX-XX-XX-227 kernel: [1279440.809276]  [<ffffffff8108dc79>] kthread+0xc9/0xe0
May 11 06:25:42 ip-XXX-XX-XX-227 kernel: [1279440.809278]  [<ffffffff8108dbb0>] ? kthread_create_on_node+0x1c0/0x1c0
May 11 06:25:42 ip-XXX-XX-XX-227 kernel: [1279440.809281]  [<ffffffff8173a3e8>] ret_from_fork+0x58/0x90
May 11 06:25:42 ip-XXX-XX-XX-227 kernel: [1279440.809283]  [<ffffffff8108dbb0>] ? kthread_create_on_node+0x1c0/0x1c0
May 11 06:25:42 ip-XXX-XX-XX-227 kernel: [1279440.809288] INFO: task mongod:6576 blocked for more than 120 seconds.
May 11 06:25:42 ip-XXX-XX-XX-227 kernel: [1279440.813394]       Not tainted 3.13.0-91-generic #138-Ubuntu
May 11 06:25:42 ip-XXX-XX-XX-227 kernel: [1279440.817264] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
May 11 06:25:42 ip-XXX-XX-XX-227 kernel: [1279440.822652] mongod          D ffff88003fc13180     0  6576      1 0x00000000
May 11 06:25:42 ip-XXX-XX-XX-227 kernel: [1279440.822654]  ffff88003bfb9ae0 0000000000000082 ffff88003c42e000 0000000000013180
May 11 06:25:42 ip-XXX-XX-XX-227 kernel: [1279440.822656]  ffff88003bfb9fd8 0000000000013180 ffff88003c42e000 ffff880037712800
May 11 06:25:42 ip-XXX-XX-XX-227 kernel: [1279440.822657]  ffff880037712870 000000000086da6a 0000000000000002 ffff88003d07c030
May 11 06:25:42 ip-XXX-XX-XX-227 kernel: [1279440.822659] Call Trace:
May 11 06:25:42 ip-XXX-XX-XX-227 kernel: [1279440.822662]  [<ffffffff8172dd89>] schedule+0x29/0x70
May 11 06:25:42 ip-XXX-XX-XX-227 kernel: [1279440.822665]  [<ffffffff8128be35>] wait_transaction_locked+0x75/0xa0
  1. 應該是重啟了 檢查 2018-05-11T06:04 和 2018-05-11T06:25 之間的 227 條日誌
  2. 因為 88 有一些寫入,而 227 無法訪問並且在 227 加入集合後失去了選舉:

2018-05-11T06:25:42.246+0000 我回复

$$ ReplicationExecutor $$VoteRequester:沒有得到 XXX.XX.XX.88:27017 的投票,因為:候選人的數據比我的更舊,resp:{ term: 2, voteGranted: false, reason: “candidate’s data is staler than mine”, ok: 1.0 }

  1. 我看不出有什麼問題
  2. 有 3 個正式成員 RS。假設 145 是一個合適的實例,當 227 變得無法訪問時,88 仍然可以將寫入複製到 145

根據此處的 MongoDB 文件,每當主要成員無法訪問時,次要成員都會觸發選舉。第一個獲得多數票的成員將成為主要成員。副本集選舉的最重要特徵是副本集中原始成員數量的大多數必須存在才能使選舉成功。如果您有一個三成員副本集,則該集可以在兩個或三個成員可以相互連接時選舉一個主副本。如果副本中的兩個成員下線,則其餘成員將仍然是輔助成員。

什麼可能導致輔助節點 (227) 變得無法訪問?

這種情況很少見,通常是由於複製滯後的網路分區造成的。

為什麼無法訪問的輔助實例會導致可訪問的主實例回滾?

噹噹前主節點退出並觸發選舉時,mongod 實例將關閉所有客戶端連接。這可確保客戶端維護副本集的準確視圖並有助於防止回滾。

副本集是否配置錯誤導致故障轉移期間回滾

我不認為你有錯誤的配置。因為通過複製方法命令(rs.statusrs.isMaster())工作正常。您可以通過rs.status()查看所有節點資訊。

以後有沒有辦法避免這種情況?

您可以通過多種方式配置副本集成員,如此處所列。在大多數情況下,副本集的成員具有預設屬性。

**Secondary-Only:**這些成員有數據,但在任何情況下都不能成為主要成員。請參閱僅限次要成員

**Hidden:**這些成員對客戶端應用程序是不可見的。請參閱隱藏成員

**Delayed:**這些成員在指定延遲後從主節點的 oplog 應用操作。您可以將延遲成員視為“滾動備份”的一種形式。請參閱延遲成員

**Arbiters:**這些成員沒有數據,只是為了參加選舉而存在。見仲裁者

**Non-Voting:**這些成員不在選舉中投票。無投票權成員僅用於超過 12 名成員的較大集合。請參閱非投票成員

中學會員

僅輔助配置可防止副本集中的輔助成員成為故障轉移中的主要成員。您可以為集合中的任何成員設置僅輔助模式,但目前主模式除外。

例如,您可能希望將位於主要數據中心之外的副本集的所有成員配置為僅輔助以防止這些成員成為主要成員。

要將成員配置為僅輔助成員,請將其優先級值設置為 0。任何優先級等於 0 的成員將永遠不會尋求選舉,並且在任何情況下都不能成為主要成員。有關優先級的更多資訊,請參閱成員優先級

注意:更新副本配置對象時,使用數組中的索引值定址集合的所有成員。數組索引從 0 開始。不要將此索引值與 members 數組中每個文件中的 _id 欄位的值混淆。_id 很少對應於數組索引。

作為修改成員優先級的範例,假設一個四成員副本集。在 mongo shell 中使用以下操作序列來修改成員優先級:

cfg = rs.conf()
cfg.members[0].priority = 2
cfg.members[1].priority = 1
cfg.members[2].priority = 0.5
cfg.members[3].priority = 0
rs.reconfig(cfg)

這將使用以下優先級設置重新配置集合:

  • 成員 0 到優先級 2 以便在大多數情況下成為主要成員。
  • 成員 1 的優先級為 1,這是預設值。如果沒有具有更高優先級的成員符合條件,則成員 1 將成為主要成員。
  • 成員 2 的優先級為 0.5,這使其成為主要成員的可能性低於其他成員,但並不禁止這種可能性。
  • 成員 3 的優先級為 0。成員 3 在任何情況下都不能成為主要成員。

筆記:

  • 如果您的副本集的成員數量為偶數,則添加一個仲裁器以確保成員可以在主節點選舉中快速獲得多數票。
  • 不能為目前主節點分配優先級**0**。如果要防止目前的primary再次被選為primary,則必須使用降級,然後使用該方法設置適當的優先級。 rs.stepDown() rs.reconfig()

引用自:https://dba.stackexchange.com/questions/206664