Postgresql
從主伺服器到備用伺服器的 Postgres 複製失敗
我有一個主備 postgres 集群;對於可用性問題,我想添加一個新的備用伺服器。
所以我創建了一個新伺服器,安裝 postgres,創建 postgres 數據文件系統,然後啟動 pgbasebackup 到新的備用伺服器(嘗試了很多次,很多來自主伺服器,很多來自第一個備用伺服器,都失敗了)。
pg_basebackup -D - -h localhost -U 複製器 -Ft –compress=0 –progress | 豬-p $ THREADS | ssh -A postgres@ $ TARGETDB “pigz -dc - | tar xvf - –directory=/var/lib/pgsql/9.6/data/”
當它完成時,我啟動 postgres,它失敗了,缺少 WAL 和偏離的時間線,儘管我很確定這些請求的 WAL 和歷史文件,甚至在主節點和次節點上都不存在。
2022-01-24 11:32:00 GMT [17951]: [1-1] user=,db=,app=,client= LOG: database system was interrupted while in recovery at log time 2022-01-24 11:10:02 GMT 2022-01-24 11:32:00 GMT [17951]: [2-1] user=,db=,app=,client= HINT: If this has occurred more than once some data might be corrupted and you might need to choose an earlier recovery target. 2022-01-24 11:32:01 GMT [17951]: [3-1] user=,db=,app=,client= LOG: restored log file "00000007.history" from archive ERROR: WAL file '00000008.history' not found in server '****' (SSH host: 10.154.129.90) 2022-01-24 11:32:01 GMT [17951]: [4-1] user=,db=,app=,client= LOG: entering standby mode 2022-01-24 11:32:02 GMT [17951]: [5-1] user=,db=,app=,client= LOG: restored log file "00000007.history" from archive ERROR: WAL file '0000000700001C55000000F4' not found in server '****' (SSH host: 10.154.129.90) 2022-01-24 11:32:03 GMT [17951]: [6-1] user=,db=,app=,client= LOG: restored log file "0000000600001C55000000F4" from archive ERROR: WAL file '0000000700001C55000000F3' not found in server '****' (SSH host: 10.154.129.90) 2022-01-24 11:32:04 GMT [18134]: [1-1] user=postgres,db=postgres,app=[unknown],client=[local] FATAL: the database system is starting up 2022-01-24 11:32:04 GMT [18145]: [1-1] user=postgres,db=postgres,app=[unknown],client=[local] FATAL: the database system is starting up 2022-01-24 11:32:04 GMT [17951]: [7-1] user=,db=,app=,client= LOG: restored log file "0000000600001C55000000F3" from archive 2022-01-24 11:32:04 GMT [17951]: [8-1] user=,db=,app=,client= FATAL: requested timeline 7 is not a child of this server's history 2022-01-24 11:32:04 GMT [17951]: [9-1] user=,db=,app=,client= DETAIL: Latest checkpoint is at 1C56/47CACF28 on timeline 6, but in the history of the requested timeline, the server forked off from that timeline at 1C3F/B7B96B90. 2022-01-24 11:32:04 GMT [17948]: [3-1] user=,db=,app=,client= LOG: startup process (PID 17951) exited with exit code 1 2022-01-24 11:32:04 GMT [17948]: [4-1] user=,db=,app=,client= LOG: aborting startup due to startup process failure 2022-01-24 11:32:04 GMT [17948]: [5-1] user=,db=,app=,client= LOG: database system is shut down
此外,我們有一個酒保伺服器的 WAL 存檔系統,因此在備份期間失去 WAL 也不值得懷疑。
恢復.conf 文件
standby_mode = 'on' primary_conninfo = 'user=replicator password=C0D5wallop host=$PRIMARYSERVER port=5432 sslmode=prefer sslcompression=1' trigger_file = '/var/lib/pgsql/9.6/boo' recovery_target_timeline='latest' restore_command = 'ssh -o StrictHostKeyChecking=no barman@$BARMANSERVER barman get-wal db-44 %f > %p'
保持開放以獲取更多資訊。感謝你的幫助。
最後,我過去在一個新的數據庫實例上進行了****測試恢復,沒有刪除postgresql.conf文件中的archive_command,最終歸檔了 0000007.history 空時間線。
因此,當新伺服器嘗試從酒保那裡獲取存檔日誌時,它會找到虛擬時間線 0000007.history 文件,但在酒保伺服器中沒有實際的 xlog,導致生成上面的錯誤日誌。
解決方案:
- 連接到酒保伺服器。
- 手動移動 0000007.history 文件。
- 手動刪除調酒師伺服器 wals/ 目錄上 xlog.db 中的 00000007.history 行。
- 重新啟動輔助節點上的 postgres。
建議:在做之前備份您在酒保伺服器中更改的任何內容。