從主伺服器到備用伺服器的 Postgres 複製失敗

January 24, 2022

我有一個主備 postgres 集群；對於可用性問題，我想添加一個新的備用伺服器。

所以我創建了一個新伺服器，安裝 postgres，創建 postgres 數據文件系統，然後啟動 pgbasebackup 到新的備用伺服器（嘗試了很多次，很多來自主伺服器，很多來自第一個備用伺服器，都失敗了）。

pg_basebackup -D - -h localhost -U 複製器 -Ft –compress=0 –progress | 豬-p $ THREADS | ssh -A postgres@ $ TARGETDB “pigz -dc - | tar xvf - –directory=/var/lib/pgsql/9.6/data/”

當它完成時，我啟動 postgres，它失敗了，缺少 WAL 和偏離的時間線，儘管我很確定這些請求的 WAL 和歷史文件，甚至在主節點和次節點上都不存在。

2022-01-24 11:32:00 GMT [17951]: [1-1] user=,db=,app=,client= LOG:  database system was interrupted while in recovery at log time 2022-01-24 11:10:02 GMT
2022-01-24 11:32:00 GMT [17951]: [2-1] user=,db=,app=,client= HINT:  If this has occurred more than once some data might be corrupted and you might need to choose an earlier recovery target.
2022-01-24 11:32:01 GMT [17951]: [3-1] user=,db=,app=,client= LOG:  restored log file "00000007.history" from archive
ERROR: WAL file '00000008.history' not found in server '****' (SSH host: 10.154.129.90)
2022-01-24 11:32:01 GMT [17951]: [4-1] user=,db=,app=,client= LOG:  entering standby mode
2022-01-24 11:32:02 GMT [17951]: [5-1] user=,db=,app=,client= LOG:  restored log file "00000007.history" from archive
ERROR: WAL file '0000000700001C55000000F4' not found in server '****' (SSH host: 10.154.129.90)
2022-01-24 11:32:03 GMT [17951]: [6-1] user=,db=,app=,client= LOG:  restored log file "0000000600001C55000000F4" from archive
ERROR: WAL file '0000000700001C55000000F3' not found in server '****' (SSH host: 10.154.129.90)
2022-01-24 11:32:04 GMT [18134]: [1-1] user=postgres,db=postgres,app=[unknown],client=[local] FATAL:  the database system is starting up
2022-01-24 11:32:04 GMT [18145]: [1-1] user=postgres,db=postgres,app=[unknown],client=[local] FATAL:  the database system is starting up
2022-01-24 11:32:04 GMT [17951]: [7-1] user=,db=,app=,client= LOG:  restored log file "0000000600001C55000000F3" from archive
2022-01-24 11:32:04 GMT [17951]: [8-1] user=,db=,app=,client= FATAL:  requested timeline 7 is not a child of this server's history
2022-01-24 11:32:04 GMT [17951]: [9-1] user=,db=,app=,client= DETAIL:  Latest checkpoint is at 1C56/47CACF28 on timeline 6, but in the history of the requested timeline, the server forked off from that timeline at 1C3F/B7B96B90.
2022-01-24 11:32:04 GMT [17948]: [3-1] user=,db=,app=,client= LOG:  startup process (PID 17951) exited with exit code 1
2022-01-24 11:32:04 GMT [17948]: [4-1] user=,db=,app=,client= LOG:  aborting startup due to startup process failure
2022-01-24 11:32:04 GMT [17948]: [5-1] user=,db=,app=,client= LOG:  database system is shut down

此外，我們有一個酒保伺服器的 WAL 存檔系統，因此在備份期間失去 WAL 也不值得懷疑。

恢復.conf 文件

standby_mode = 'on'
primary_conninfo = 'user=replicator password=C0D5wallop host=$PRIMARYSERVER port=5432 sslmode=prefer sslcompression=1'
trigger_file = '/var/lib/pgsql/9.6/boo'
recovery_target_timeline='latest'
restore_command = 'ssh -o StrictHostKeyChecking=no barman@$BARMANSERVER barman get-wal db-44 %f &gt; %p'

保持開放以獲取更多資訊。感謝你的幫助。

最後，我過去在一個新的數據庫實例上進行了****測試恢復，沒有刪除postgresql.conf文件中的archive_command，最終歸檔了 0000007.history 空時間線。
因此，當新伺服器嘗試從酒保那裡獲取存檔日誌時，它會找到虛擬時間線 0000007.history 文件，但在酒保伺服器中沒有實際的 xlog，導致生成上面的錯誤日誌。
解決方案：
連接到酒保伺服器。
手動移動 0000007.history 文件。
手動刪除調酒師伺服器 wals/ 目錄上 xlog.db 中的 00000007.history 行。
重新啟動輔助節點上的 postgres。
建議：在做之前備份您在酒保伺服器中更改的任何內容。

引用自：https://dba.stackexchange.com/questions/306490

從主伺服器到備用伺服器的 Postgres 複製失敗

相關問答

更改副本上的角色

WAL 存檔：失敗（請確保設置了 WAL 運輸）

安全關閉看似配置錯誤（但工作正常）的 PostgreSQL 複製/歸檔

PostgreSQL 9.6 pg_rewind - 需要很長時間才能複制

主從複製在 PostgreSQL 9.6 上不起作用

是否可以使用複制伺服器進行簡單的 postgres 轉儲？