Postgresql

postgresql 複製:WAL 在段可以歸檔之前旋轉?

  • December 1, 2020

使用 PostgreSQL 10.7

今天早上檢查我的日誌,我經常看到這個錯誤:

archive command failed with exit code 1
The failed archive command was: test ! -f /opt/bitnami/postgresql/archive/00000001000005D30000008B && cp pg_wal/00000001000005D30000008B /opt/bitnami/postgresql/archive/00000001000005D30000008B

我剛剛看到不推薦使用文件中的命令,我想知道是否有人可以指出更好的歸檔命令的方向。

同時,此段不在pg_wal,archive_statusarchive目錄中。我注意到日誌中有一些檢查點警告。

我們以前在複製伺服器上遇到過類似的問題,我認為打開存檔會有所幫助,因為由於某種原因它沒有啟用(現在是這樣),但現在看來問題出在其他地方。

數據庫確實每天都承受著沉重的負載,cronjob 將它們恢復為匹配生產數據庫,這似乎是發生檢查點錯誤的時候。

現在我唯一的選擇是創建一個新的基本備份,因為我至少缺少一個 WAL 段嗎?

這是相關部分postgresql.conf

#------------------------------------------------------------------------------
# WRITE AHEAD LOG
#------------------------------------------------------------------------------

# - Settings -

wal_level = 'hot_standby'
                                       # (change requires restart)
#fsync = on                             # flush data to disk for crash safety
                                       # (turning this off can cause
                                       # unrecoverable data corruption)
#synchronous_commit = on                # synchronization level;
                                       # off, local, remote_write, remote_apply, or on
#wal_sync_method = fsync                # the default is the first option
                                       # supported by the operating system:
                                       #   open_datasync
                                       #   fdatasync (default on Linux)
                                       #   fsync
                                       #   fsync_writethrough
                                       #   open_sync
#full_page_writes = on                  # recover from partial page writes
#wal_compression = off                  # enable compression of full-page writes
#wal_log_hints = off                    # also do full page writes of non-critical updates
                                       # (change requires restart)
#wal_buffers = -1                       # min 32kB, -1 sets based on shared_buffers
                                       # (change requires restart)
#wal_writer_delay = 200ms               # 1-10000 milliseconds
#wal_writer_flush_after = 1MB           # measured in pages, 0 disables

#commit_delay = 0                       # range 0-100000, in microseconds
#commit_siblings = 5                    # range 1-1000

# - Checkpoints -

#checkpoint_timeout = 5min              # range 30s-1d
max_wal_size = '400MB'
#min_wal_size = 80MB
#checkpoint_completion_target = 0.5     # checkpoint target duration, 0.0 - 1.0
#checkpoint_flush_after = 0             # measured in pages, 0 disables
#checkpoint_warning = 30s               # 0 disables

# - Archiving -

archive_mode = on               # enables archiving; off, on, or always
                               # (change requires restart)
archive_command = 'test ! -f /opt/bitnami/postgresql/archive/%f && cp %p /opt/bitnami/postgresql/archive/%f'      # command to use to archive a logfile segment
                               # placeholders: %p = path of file to archive
                               #               %f = file name only
                               # e.g. 'test ! -f /mnt/server/archivedir/%f && cp %p /mnt/server/archivedir/%f'
#archive_timeout = 0            # force a logfile segment switch after this
                               # number of seconds; 0 disables

對於我能做些什麼來防止這種情況再次發生,我將不勝感激。

該文件應保留在 pg_wal 中,直到歸檔命令成功。如果存檔從未成功,則該文件應該仍然存在。也許您的文件系統已損壞並失去文件,或者您的員工正在搞亂他們不應該做的事情。

您使用的命令可能發生的情況是,如果它報告成功然後系統崩潰,並且當它恢復時存檔文件不存在或沒有正確的數據,因為它從未同步到磁碟。但這種情況不會在您的日誌中留下歸檔錯誤消息。

引用自:https://dba.stackexchange.com/questions/280664