Replication

MongoDB 副本集 InitialSyncOplogSourceMissing 錯誤

  • July 20, 2020

我正在使用 Mongo 3.6.12 在 AWS 中設置一個 3 節點 MongoDB 副本集,其中一個節點。大約有 400GB 的數據需要同步到輔助節點。初始同步開始正常。但是,在數據同步過程執行一天后,我的 AWS 節點收到 InitialSyncOplogSourceMissing 錯誤:

Initial Sync Attempt Statistics: { failedInitialSyncAttempts: 9, maxFailedInitialSyncAttempts: 10, initialSyncStart: new Date(1565682109467), initialSyncAttempts: [ { durationMillis: 0, status: "InitialSyncOplogSourceMissing: No valid sync source found in current replica set to do an initial sync.", syncSource: ":27017" }, { durationMillis: 0, status: "InitialSyncOplogSourceMissing: No valid sync source found in current replica set to do an initial sync.", syncSource: ":27017" }, { durationMillis: 0, status: "InitialSyncOplogSourceMissing: No valid sync source found in current replica set to do an initial sync.", syncSource: ":27017" }, { durationMillis: 0, status: "InitialSyncOplogSourceMissing: No valid sync source found in current replica set to do an initial sync.", syncSource: ":27017" }, { durationMillis: 0, status: "InitialSyncOplogSourceMissing: No valid sync source found in current replica set to do an initial sync.", syncSource: ":27017" }, { durationMillis: 0, status: "InitialSyncOplogSourceMissing: No valid sync source found in current replica set to do an initial sync.", syncSource: ":27017" }, { durationMillis: 0, status: "InitialSyncOplogSourceMissing: No valid sync source found in current replica set to do an initial sync.", syncSource: ":27017" }, { durationMillis: 0, status: "InitialSyncOplogSourceMissing: No valid sync source found in current replica set to do an initial sync.", syncSource: ":27017" }, { durationMillis: 0, status: "InitialSyncOplogSourceMissing: No valid sync source found in current replica set to do an initial sync.", syncSource: ":27017" } ] }
2019-08-13T08:43:28.775+0100 E REPL     [replication-1] Initial sync attempt failed -- attempts left: 0 cause: InitialSyncOplogSourceMissing: No valid sync source found in current replica set to do an initial sync.
2019-08-13T08:43:28.775+0100 F REPL     [replication-1] The maximum number of retries have been exhausted for initial sync.
2019-08-13T08:43:28.775+0100 D EXECUTOR [replication-1] Executing a task on behalf of pool replication
2019-08-13T08:43:28.775+0100 D EXECUTOR [replication-0] Not reaping because the earliest retirement date is 2019-08-13T08:43:58.775+0100
2019-08-13T08:43:28.775+0100 D STORAGE  [replication-1] dropCollection: local.temp_oplog_buffer
2019-08-13T08:43:28.776+0100 D STORAGE  [replication-1] dropCollection: local.temp_oplog_buffer - dropAllIndexes start
2019-08-13T08:43:28.776+0100 D INDEX    [replication-1]      dropAllIndexes dropping: { v: 2, key: { _id: 1 }, name: "_id_", ns: "local.temp_oplog_buffer" }
2019-08-13T08:43:28.776+0100 D STORAGE  [replication-1] local.temp_oplog_buffer: clearing plan cache - collection info cache reset
2019-08-13T08:43:28.776+0100 D STORAGE  [replication-1] looking up metadata for: local.temp_oplog_buffer @ RecordId(113)
2019-08-13T08:43:28.776+0100 D STORAGE  [replication-1] WT begin_transaction for snapshot id 273
2019-08-13T08:43:28.776+0100 D STORAGE  [replication-1]  fetched CCE metadata: { md: { ns: "local.temp_oplog_buffer", options: { temp: true }, indexes: [ { spec: { v: 2, key: { _id: 1 }, name: "_id_", ns: "local.temp_oplog_buffer" }, ready: true, multikey: false, multikeyPaths: { _id: BinData(0, 00) }, head: 0, prefix: -1 } ], prefix: -1 }, idxIdent: { _id_: "local/index-21-7814325343802347057" }, ns: "local.temp_oplog_buffer", ident: "local/collection-20-7814325343802347057" }
2019-08-13T08:43:28.776+0100 D STORAGE  [replication-1] returning metadata: md: { ns: "local.temp_oplog_buffer", options: { temp: true }, indexes: [ { spec: { v: 2, key: { _id: 1 }, name: "_id_", ns: "local.temp_oplog_buffer" }, ready: true, multikey: false, multikeyPaths: { _id: BinData(0, 00) }, head: 0, prefix: -1 } ], prefix: -1 }
2019-08-13T08:43:28.776+0100 D STORAGE  [replication-1] looking up metadata for: local.temp_oplog_buffer @ RecordId(113)
2019-08-13T08:43:28.776+0100 D STORAGE  [replication-1] looking up metadata for: local.temp_oplog_buffer @ RecordId(113)
2019-08-13T08:43:28.776+0100 D STORAGE  [replication-1] recording new metadata: { md: { ns: "local.temp_oplog_buffer", options: { temp: true }, indexes: [], prefix: -1 }, idxIdent: {}, ns: "local.temp_oplog_buffer", ident: "local/collection-20-7814325343802347057" }
2019-08-13T08:43:28.776+0100 D STORAGE  [replication-1] looking up metadata for: local.temp_oplog_buffer @ RecordId(113)
2019-08-13T08:43:28.776+0100 D STORAGE  [replication-1]  fetched CCE metadata: { md: { ns: "local.temp_oplog_buffer", options: { temp: true }, indexes: [], prefix: -1 }, idxIdent: {}, ns: "local.temp_oplog_buffer", ident: "local/collection-20-7814325343802347057" }
2019-08-13T08:43:28.776+0100 D STORAGE  [replication-1] returning metadata: md: { ns: "local.temp_oplog_buffer", options: { temp: true }, indexes: [], prefix: -1 }
2019-08-13T08:43:28.776+0100 D STORAGE  [replication-1] looking up metadata for: local.temp_oplog_buffer @ RecordId(113)
2019-08-13T08:43:28.776+0100 D STORAGE  [replication-1]  fetched CCE metadata: { md: { ns: "local.temp_oplog_buffer", options: { temp: true }, indexes: [], prefix: -1 }, idxIdent: {}, ns: "local.temp_oplog_buffer", ident: "local/collection-20-7814325343802347057" }
2019-08-13T08:43:28.776+0100 D STORAGE  [replication-1] returning metadata: md: { ns: "local.temp_oplog_buffer", options: { temp: true }, indexes: [], prefix: -1 }
2019-08-13T08:43:28.777+0100 D STORAGE  [replication-1] dropCollection: local.temp_oplog_buffer - dropAllIndexes done
2019-08-13T08:43:28.777+0100 I STORAGE  [replication-1] Finishing collection drop for local.temp_oplog_buffer (no UUID).
2019-08-13T08:43:28.777+0100 D STORAGE  [replication-1] looking up metadata for: local.temp_oplog_buffer @ RecordId(113)
2019-08-13T08:43:28.777+0100 D STORAGE  [replication-1]  fetched CCE metadata: { md: { ns: "local.temp_oplog_buffer", options: { temp: true }, indexes: [], prefix: -1 }, idxIdent: {}, ns: "local.temp_oplog_buffer", ident: "local/collection-20-7814325343802347057" }
2019-08-13T08:43:28.777+0100 D STORAGE  [replication-1] returning metadata: md: { ns: "local.temp_oplog_buffer", options: { temp: true }, indexes: [], prefix: -1 }
2019-08-13T08:43:28.777+0100 D STORAGE  [replication-1] looking up metadata for: local.temp_oplog_buffer @ RecordId(113)
2019-08-13T08:43:28.777+0100 D STORAGE  [replication-1]  fetched CCE metadata: { md: { ns: "local.temp_oplog_buffer", options: { temp: true }, indexes: [], prefix: -1 }, idxIdent: {}, ns: "local.temp_oplog_buffer", ident: "local/collection-20-7814325343802347057" }
2019-08-13T08:43:28.777+0100 D STORAGE  [replication-1] returning metadata: md: { ns: "local.temp_oplog_buffer", options: { temp: true }, indexes: [], prefix: -1 }
2019-08-13T08:43:28.777+0100 D STORAGE  [replication-1] looking up metadata for: local.temp_oplog_buffer @ RecordId(113)
2019-08-13T08:43:28.777+0100 D STORAGE  [replication-1]  fetched CCE metadata: { md: { ns: "local.temp_oplog_buffer", options: { temp: true }, indexes: [], prefix: -1 }, idxIdent: {}, ns: "local.temp_oplog_buffer", ident: "local/collection-20-7814325343802347057" }
2019-08-13T08:43:28.777+0100 D STORAGE  [replication-1] returning metadata: md: { ns: "local.temp_oplog_buffer", options: { temp: true }, indexes: [], prefix: -1 }
2019-08-13T08:43:28.777+0100 D STORAGE  [replication-1] looking up metadata for: local.temp_oplog_buffer @ RecordId(113)
2019-08-13T08:43:28.777+0100 D STORAGE  [replication-1]  fetched CCE metadata: { md: { ns: "local.temp_oplog_buffer", options: { temp: true }, indexes: [], prefix: -1 }, idxIdent: {}, ns: "local.temp_oplog_buffer", ident: "local/collection-20-7814325343802347057" }
2019-08-13T08:43:28.777+0100 D STORAGE  [replication-1] returning metadata: md: { ns: "local.temp_oplog_buffer", options: { temp: true }, indexes: [], prefix: -1 }
2019-08-13T08:43:28.777+0100 D STORAGE  [replication-1] deleting metadata for local.temp_oplog_buffer @ RecordId(113)
2019-08-13T08:43:28.777+0100 D STORAGE  [replication-1] WT commit_transaction for snapshot id 273
2019-08-13T08:43:28.780+0100 D STORAGE  [replication-1] WT drop of  table:local/index-21-7814325343802347057 res 0
2019-08-13T08:43:28.780+0100 D STORAGE  [replication-1] ~WiredTigerRecordStore for: local.temp_oplog_buffer
2019-08-13T08:43:28.782+0100 D STORAGE  [replication-1] WT drop of  table:local/collection-20-7814325343802347057 res 0
2019-08-13T08:43:28.782+0100 E REPL     [replication-1] Initial sync failed, shutting down now. Restart the server to attempt a new initial sync.
2019-08-13T08:43:28.782+0100 F -        [replication-1] Fatal assertion 40088 InitialSyncOplogSourceMissing: No valid sync source found in current replica set to do an initial sync. at src\mongo\db\repl\replication_coordinator_impl.cpp 711
2019-08-13T08:43:28.782+0100 F -        [replication-1] 

重新啟動 mongod 實例將立即導致相同的錯誤。如果我強制副本集重新配置,所有同步到此輔助節點的數據都將被刪除並開始另一個初始同步。此 InitialSyncOplogSourceMissing 錯誤的最佳操作是什麼?

我設法設置了這個副本集。事實證明,我們的 AWS 實例中有一個程序每天都會重啟 mongod 實例。由於網路延遲,同步所有數據大約需要兩天時間,因此在初始數據同步階段 mongod 被迫關閉時會出現此錯誤。可能是 Mongo 需要研究的東西。

您是否嘗試按照以下 JIRA 中的建議從備份中恢復? https://jira.mongodb.org/browse/SERVER-32199

這可能是由於磁碟損壞。還請嘗試切換您的主節點(同步源),然後嘗試啟動完整的重新同步

謝謝!

引用自:https://dba.stackexchange.com/questions/245242