Postgresql

使用流複製通過 Postgresql 9.4 將熱備用提升為 master

  • October 6, 2015

我目前有 2 台執行 Centos 6 和 Postgresql 9.4 的數據庫伺服器(同一網路上的相同複製)。它們被設置為主伺服器和使用pg_basebackup從主伺服器創建的備份,主伺服器將 WAL 複製到輔助伺服器的文件系統,備份集從 WAL 更新並使用流複製連接到主伺服器。

這一切都很好,直到我殺死主人並嘗試將備份提升為新主人。我正在執行kill -9以模擬主伺服器的災難性故障,然後不久之後觸摸備份上的觸發器文件。

問題是,只要我觸摸備份伺服器上的觸發器文件,它就會檢測到它,然後無法作為主伺服器啟動。

以下是我從主數據庫斷開備份時的日誌:

< 2015-10-05 14:20:52.888 BST >LOG:  trigger file found: /tmp/psql.trigger
< 2015-10-05 14:20:52.888 BST >LOG:  redo is not required
< 2015-10-05 14:20:52.904 BST >LOG:  restored log file "000000010000000000000004" from archive
< 2015-10-05 14:20:52.906 BST >PANIC:  record with zero length at 0/40000C8
< 2015-10-05 14:20:52.906 BST >LOG:  startup process (PID 3101) was terminated by signal 6: Aborted
< 2015-10-05 14:20:52.906 BST >LOG:  terminating any other active server processes
< 2015-10-05 14:21:08.580 BST >LOG:  database system was interrupted while in recovery at log time 2015-10-05 13:50:08 BST
< 2015-10-05 14:21:08.580 BST >HINT:  If this has occurred more than once some data might be corrupted and you might need to choose an earlier recovery target.
< 2015-10-05 14:21:08.935 BST >LOG:  entering standby mode
< 2015-10-05 14:21:08.947 BST >LOG:  restored log file "000000010000000000000004" from archive
< 2015-10-05 14:21:08.949 BST >LOG:  record with zero length at 0/40000C8
< 2015-10-05 14:21:08.951 BST >FATAL:  could not connect to the primary server: could not connect to server: Connection refused
       Is the server running on host "192.168.56.101" and accepting
       TCP/IP connections on port 5432?

13:50 的日誌只是伺服器以恢復模式啟動。

主節點上的配置postgresql.conf是:

listen_addresses = '*'
wal_level = archive
max_wal_senders = 2
archive_mode = on
archive_command = 'rsync -a %p postgres@192.168.56.102:/apps/pgsql/9.4/write_ahead_logs/%f'
archive_timeout = 600

recovery.conf備份中的配置是:

standby_mode = 'on'
primary_conninfo = 'host=192.168.56.101 port=5432 user=backups password=abc'
restore_command = 'cp /apps/pgsql/9.4/write_ahead_logs/%f %p'
archive_cleanup_command = 'pg_archivecleanup /apps/pgsql/9.4/write_ahead_logs %r'
trigger_file = '/tmp/psql.trigger'

我確定我遺漏了一些重要的東西,但是在搜尋了網際網路和文件之後,我無法找到它。任何幫助將非常感激。

解決了。問題是我讓Monit在後台執行,它在我編寫recovery.conf之前獲取備份數據庫(伺服器由廚師提供)。

解決方案是將 -R 標誌傳遞pg_basebackup給以確保數據庫永遠不會在恢復模式之外啟動,這會導致它與 WAL 和來自主伺服器的流式複制不兼容。

引用自:https://dba.stackexchange.com/questions/117157