Postgresql

事務送出前恢復停止 - Postgres 11 PITR

  • August 17, 2021

我正在嘗試在重複節點上進行 PITR。這是我正在遵循的步驟。

  1. 在源節點上開始備份。
select pg_start_backup('backup')
  1. 在目標節點上使用這些 rsync 命令複製目標節點上的所有文件:
rsync  -av sourcedb:/TPINFO01/wal_archive/ /TPINFO01/wal_archive
rsync --delete -av sourcedb:/TPINFO01/datadg/tablespace/ /TPINFO01/datadg/tablespace
rsync --delete -av sourcedb:/TPINFO01/datadg/data/ /TPINFO01/datadg/data --exclude 'pg_log' --exclude 'pg_replslot' --exclude 'postgresql.conf' --exclude 'recovery.conf' 
  1. 停止源節點上的備份
select pg_stop_backup();
  1. 通過 crontab,每隔一分鐘從源數據庫中複製 WAL 文件。在源數據庫上啟用了連續的 WAL 歸檔。
* * * 08 * /home/postgres/rsync_wal.sh > /dev/null 2>&1

rsync_wal.sh

#!/bin/sh

rsync -av sourcedb:/TPINFO01/wal_archive/ /TPINFO01/wal_archive
  1. 第二天,我recovery.conf在 targetdb 上創建如下:
recovery_target_time = '2021-08-16 02:00:00 JST'
recovery_target_action = 'pause'
recovery_target_inclusive = 'false'
restore_command = 'rsync -a /TPINFO01/wal_archive/%f %p'
  1. 它恢復文件但因以下錯誤而停止:
2021-08-16 16:33:52.853 JST,,,3608,,611a14e0.e18,1,,2021-08-16 16:33:52 JST,,0,LOG,00000,"database system was interrupted while in recovery at log time 2021-08-16 02:27:59 JST",,"If this has occurred more than once some data might be corrupted and you might need to choose an earlier recovery target.",,,,,,,""
2021-08-16 16:33:53.023 JST,,,3608,,611a14e0.e18,2,,2021-08-16 16:33:52 JST,,0,LOG,00000,"starting point-in-time recovery to 2021-08-16 02:00:00+09",,,,,,,,,""
2021-08-16 16:33:53.120 JST,,,3608,,611a14e0.e18,3,,2021-08-16 16:33:52 JST,,0,LOG,00000,"restored log file ""0000000100009928000000A7"" from archive",,,,,,,,,""
2021-08-16 16:33:53.144 JST,,,3608,,611a14e0.e18,4,,2021-08-16 16:33:52 JST,1/0,0,LOG,00000,"redo starts at 9928/A73B0DA8",,,,,,,,,""
2021-08-16 16:33:53.145 JST,,,3608,,611a14e0.e18,5,,2021-08-16 16:33:52 JST,1/0,0,LOG,00000,"recovery stopping before commit of transaction 74385349, time 2021-08-16 02:30:01.469553+09",,,,,,,,,""
2021-08-16 16:33:53.145 JST,,,3608,,611a14e0.e18,6,,2021-08-16 16:33:52 JST,1/0,0,FATAL,XX000,"requested recovery stop point is before consistent recovery point",,,,,,,,,""

這裡有什麼問題?目標節點上的數據庫甚至沒有啟動時如何損壞?我錯過了一些步驟嗎?

這是備份文件的樣子

START WAL LOCATION: 9928/9D000028 (file 00000001000099280000009D)
STOP WAL LOCATION: 9928/9D000168 (file 00000001000099280000009D)
CHECKPOINT LOCATION: 9928/9D000098
BACKUP METHOD: pg_start_backup
BACKUP FROM: master
START TIME: 2021-08-15 21:58:00 JST
LABEL: backup
START TIMELINE: 1
STOP TIME: 2021-08-15 22:03:44 JST
STOP TIMELINE: 1

正如 Laurenz Albe 在評論中提到的,我在pg_stop_backup().

我重複了上述步驟,唯一的區別是,將 WAL 段從源數據庫複製到目標數據庫的 cronjob 在備份期間連續執行,以確保不會失去任何 WAL 段。

它以這種方式工作,我可以在我想要的時間做 PITR。

您的 WAL 歸檔似乎無法可靠執行。

rsync與其使用from cron 作業來複製 WAL 段,這似乎偶爾會失去段,不如使用 PostgreSQLarchive_command來歸檔 WAL 段。然後 PostgreSQL 將確保所有段都被歸檔。

引用自:https://dba.stackexchange.com/questions/298087