Postgresql

高可用的 PostgreSQL 與讚助人

  • February 11, 2021

我按照本指南為 Postgres HA 設置實驗室。

我完全按照指南(在我的情況下更改 IP 地址),畢竟這一切都在 Postgres 伺服器 1 上工作

但是當談到 Postgres server 2 patroni.ymlsetup

該指南說在兩個 Postgres 伺服器上具有相同的patroni.yml設置,但是當重新啟動patroni service

這個問題發生在 server1

quanlm@DB1:~$ sudo service patroni status
● patroni.service - Runners to orchestrate a high-availability PostgreSQL
  Loaded: loaded (/etc/systemd/system/patroni.service; disabled; vendor preset: enabled)
  Active: active (running) since Tue 2019-11-12 07:35:33 UTC; 14min ago
Main PID: 411 (patroni)
   Tasks: 12
  Memory: 77.6M
     CPU: 4.041s
  CGroup: /system.slice/patroni.service
          ├─411 /usr/bin/python3 /usr/local/bin/patroni /etc/patroni.yml
          ├─431 postgres -D /data/patroni --config-file=/data/patroni/postgresql.conf --listen_addresses=192.168.122.77 --max_prepared_tran
          ├─435 postgres: postgres: checkpointer process                                                                                   
          ├─436 postgres: postgres: writer process                                                                                         
          ├─439 postgres: postgres: stats collector process                                                                                
          ├─447 postgres: postgres: postgres postgres 192.168.122.77(49984) idle                                                           
          ├─455 postgres: postgres: wal writer process                                                                                     
          └─456 postgres: postgres: autovacuum launcher process                                                                            

Nov 12 07:49:28 DB1 patroni[411]: 2019-11-12 07:49:28,533 INFO: no action.  i am the leader with the lock
Nov 12 07:49:38 DB1 patroni[411]: 2019-11-12 07:49:38,459 INFO: Lock owner: postgresql0; I am postgresql0
Nov 12 07:49:38 DB1 patroni[411]: 2019-11-12 07:49:38,536 INFO: no action.  i am the leader with the lock
Nov 12 07:49:48 DB1 patroni[411]: 2019-11-12 07:49:48,459 INFO: Lock owner: postgresql0; I am postgresql0
Nov 12 07:49:48 DB1 patroni[411]: 2019-11-12 07:49:48,544 INFO: no action.  i am the leader with the lock
Nov 12 07:49:58 DB1 patroni[411]: 2019-11-12 07:49:58,458 INFO: Lock owner: postgresql0; I am postgresql0
Nov 12 07:49:58 DB1 patroni[411]: 2019-11-12 07:49:58,548 INFO: no action.  i am the leader with the lock
Nov 12 07:50:08 DB1 patroni[411]: 2019-11-12 07:50:08,457 INFO: Lock owner: postgresql0; I am postgresql0
Nov 12 07:50:08 DB1 patroni[411]: 2019-11-12 07:50:08,539 INFO: no action.  i am the leader with the lock
Nov 12 07:50:19 DB1 patroni[411]: 2019-11-12 07:50:19,949 INFO: acquired session lock as a leader

是的,伺服器 1 很好,但在伺服器 2 上

quanlm@DB2:~$ sudo service patroni status
● patroni.service - Runners to orchestrate a high-availability PostgreSQL
  Loaded: loaded (/etc/systemd/system/patroni.service; disabled; vendor preset: enabled)
  Active: failed (Result: exit-code) since Tue 2019-11-12 07:50:02 UTC; 2s ago
 Process: 9514 ExecStart=/usr/local/bin/patroni /etc/patroni.yml (code=exited, status=1/FAILURE)
Main PID: 9514 (code=exited, status=1/FAILURE)

Nov 12 07:50:02 DB2 patroni[9514]:   File "/usr/lib/python3.5/socketserver.py", line 440, in __init__
Nov 12 07:50:02 DB2 patroni[9514]:     self.server_bind()
Nov 12 07:50:02 DB2 patroni[9514]:   File "/usr/lib/python3.5/http/server.py", line 138, in server_bind
Nov 12 07:50:02 DB2 patroni[9514]:     socketserver.TCPServer.server_bind(self)
Nov 12 07:50:02 DB2 patroni[9514]:   File "/usr/lib/python3.5/socketserver.py", line 454, in server_bind
Nov 12 07:50:02 DB2 patroni[9514]:     self.socket.bind(self.server_address)
Nov 12 07:50:02 DB2 patroni[9514]: OSError: [Errno 99] Cannot assign requested address
Nov 12 07:50:02 DB2 systemd[1]: patroni.service: Main process exited, code=exited, status=1/FAILURE
Nov 12 07:50:02 DB2 systemd[1]: patroni.service: Unit entered failed state.
Nov 12 07:50:02 DB2 systemd[1]: patroni.service: Failed with result 'exit-code'.

畢竟它不起作用。

我允許通過編輯和listen_addresses = '*'遠端連接兩個伺服器postgresql.conf

host all all 0.0.0.0/0 md5

pg_hba.conf

所以當HAproxy上班的時候,第1台宕機的時候,第2台伺服器沒有上去。

問題肯定在patroni伺服器 2 上,但如何解決呢?

否則,有什麼辦法可以繞過 HA postgresql 伺服器?

P/s:防火牆設置

quanlm@DB1:~$ sudo ufw status
Status: inactive
quanlm@DB1:~$ sudo iptables -L
Chain INPUT (policy ACCEPT)
target     prot opt source               destination         

Chain FORWARD (policy ACCEPT)
target     prot opt source               destination         

Chain OUTPUT (policy ACCEPT)
target     prot opt source               destination  
quanlm@DB2:~$ sudo ufw status
Status: inactive
quanlm@DB2:~$ sudo iptables -L
Chain INPUT (policy ACCEPT)
target     prot opt source               destination         

Chain FORWARD (policy ACCEPT)
target     prot opt source               destination         

Chain OUTPUT (policy ACCEPT)
target     prot opt source               destination  

P/ss: 我的patroni.yml

quanlm@DB1:~$ cat /etc/patroni.yml 
scope: postgres
namespace: /db/
name: postgresql0

restapi:
   listen: 192.168.122.77:8008
   connect_address: 192.168.122.77:8008

etcd:
   host: 192.168.122.156:2379

bootstrap:
   dcs:
       ttl: 30
       loop_wait: 10
       retry_timeout: 10
       maximum_lag_on_failover: 1048576
       postgresql:
           use_pg_rewind: true

   initdb:
   - encoding: UTF8
   - data-checksums

   pg_hba:
   - host replication replicator 127.0.0.1/32 md5
   - host replication replicator 192.168.122.77/0 md5
   - host replication replicator 192.168.122.240/0 md5
   - host all all 0.0.0.0/0 md5

   users:
       admin:
           password: admin
           options:
               - createrole
               - createdb

postgresql:
   listen: 192.168.122.77:5432
   connect_address: 192.168.122.77:5432
   data_dir: /data/patroni
   pgpass: /tmp/pgpass
   authentication:
       replication:
           username: replicator
           password: password
       superuser:
           username: postgres
           password: password
   parameters:
       unix_socket_directories: '.'

tags:
   nofailover: false
   noloadbalance: false
   clonefrom: false
   nosync: false

是的,在兩台伺服器上

更新#1


patroni.yml 有零錢name: postgresql0->name: postgresqp1

其餘 api 設置為主機 IP ‘192.168.122.240’

但是一個

postgresql:
   listen: 192.168.122.77:5432
   connect_address: 192.168.122.77:5432

發生了這個問題:

quanlm@DB2:~⟫ sudo service patroni status
● patroni.service - Runners to orchestrate a high-availability PostgreSQL
  Loaded: loaded (/etc/systemd/system/patroni.service; disabled; vendor preset: enabled)
  Active: active (running) since Wed 2019-11-13 02:17:16 UTC; 25s ago
Main PID: 32363 (patroni)
   Tasks: 6
  Memory: 45.7M
     CPU: 6.326s
  CGroup: /system.slice/patroni.service
          ├─ 1014 postgres -D /data/patroni --config-file=/data/patroni/postgresql.conf --port=5432 --wal_level=hot_standby --max_wal_senders=10 --cluster
          └─32363 /usr/bin/python3 /usr/local/bin/patroni /etc/patroni.yml

Nov 13 02:17:39 DB2 patroni[32363]: 192.168.122.77:5432 - accepting connections
Nov 13 02:17:39 DB2 patroni[32363]: 192.168.122.77:5432 - accepting connections
Nov 13 02:17:39 DB2 patroni[32363]: 2019-11-13 02:17:39,940 INFO: Lock owner: postgresql0; I am postgresql1
Nov 13 02:17:39 DB2 patroni[32363]: 2019-11-13 02:17:39,940 INFO: does not have lock
Nov 13 02:17:39 DB2 patroni[32363]: 2019-11-13 02:17:39,940 INFO: establishing a new patroni connection to the postgres cluster
Nov 13 02:17:40 DB2 patroni[32363]: LOG:  could not bind IPv4 socket: Cannot assign requested address
Nov 13 02:17:40 DB2 patroni[32363]: HINT:  Is another postmaster already running on port 5432? If not, wait a few seconds and retry.
Nov 13 02:17:40 DB2 patroni[32363]: WARNING:  could not create listen socket for "192.168.122.77"
Nov 13 02:17:40 DB2 patroni[32363]: FATAL:  could not create any TCP/IP sockets
Nov 13 02:17:40 DB2 patroni[32363]: 2019-11-13 02:17:40,042 INFO: demoting self because i do not have the lock and i was a leader

如果我改為

postgresql:
   listen: 192.168.122.240:5432
   connect_address: 192.168.122.240:5432

這發生過:

quanlm@DB2:~⟫ sudo service patroni status
● patroni.service - Runners to orchestrate a high-availability PostgreSQL
  Loaded: loaded (/etc/systemd/system/patroni.service; disabled; vendor preset: enabled)
  Active: active (running) since Wed 2019-11-13 02:18:57 UTC; 40s ago
Main PID: 3785 (patroni)
   Tasks: 11
  Memory: 59.6M
     CPU: 770ms
  CGroup: /system.slice/patroni.service
          ├─3785 /usr/bin/python3 /usr/local/bin/patroni /etc/patroni.yml
          ├─3818 postgres -D /data/patroni --config-file=/data/patroni/postgresql.conf --max_replication_slots=10 --port=5432 --max_connections=100 --max_
          ├─3853 postgres: postgres: startup process   recovering 000000040000000000000006                                                                
          ├─3857 postgres: postgres: checkpointer process                                                                                                 
          ├─3858 postgres: postgres: writer process                                                                                                       
          ├─3859 postgres: postgres: stats collector process                                                                                              
          └─3916 postgres: postgres: postgres postgres 192.168.122.240(39576) idle                                                                        

Nov 13 02:19:19 DB2 patroni[3785]:         
Nov 13 02:19:24 DB2 patroni[3785]: FATAL:  could not start WAL streaming: ERROR:  replication slot "postgresql1" does not exist
Nov 13 02:19:24 DB2 patroni[3785]:         
Nov 13 02:19:27 DB2 patroni[3785]: 2019-11-13 02:19:27,938 INFO: Lock owner: postgresql0; I am postgresql1
Nov 13 02:19:27 DB2 patroni[3785]: 2019-11-13 02:19:27,938 INFO: does not have lock
Nov 13 02:19:27 DB2 patroni[3785]: 2019-11-13 02:19:27,966 INFO: no action.  i am a secondary and i am following a leader
Nov 13 02:19:29 DB2 patroni[3785]: FATAL:  could not start WAL streaming: ERROR:  replication slot "postgresql1" does not exist
Nov 13 02:19:29 DB2 patroni[3785]:         
Nov 13 02:19:34 DB2 patroni[3785]: FATAL:  could not start WAL streaming: ERROR:  replication slot "postgresql1" does not exist
Nov 13 02:19:34 DB2 patroni[3785]:   

更新#2


設置patroni.ymlname: postgresql0 並更改後

postgresql:
   listen: 192.168.122.240:5432
   connect_address: 192.168.122.240:5432

重置服務後,兩個數據庫現在都啟動了….我認為在為 HA 目的設置主動-被動伺服器時不會像那樣…而且他們沒有互相複製

圖片: https ://raw.githubusercontent.com/lmq1999/Mytest/master/Mytest/Mytest/Screenshot%20from%202019-11-13%2009-26-19.png

patroni.yml對第一個節點(名為postgresql0)很好,但您必須將第二個節點的名稱更改為(例如)postgresql1。此外,選擇另一個節點的 IP(列在該教程的頂部)並使用該節點更新 YAML(在restapi和下出現了很多次postgresql)。

如果有疑問,Patroni 儲存庫中的範例文件(https://github.com/zalando/patroni/blob/master/postgres0.yml和其他兩個)總是包含有效的值。

更新 #1 中的最後一個日誌片段是正常行為,它顯示一個實例已經在跟隨另一個實例。複製槽錯誤通常會彈出,然後很快就會消失。

每個節點的名稱在您配置的每個節點中必須是唯一的。

這是詳細的分步教程,您可以按照使用 Patroni 和 Haproxy 設置高可用性 postgrsql 集群

引用自:https://dba.stackexchange.com/questions/253049