高可用的 PostgreSQL 與讚助人
我按照本指南為 Postgres HA 設置實驗室。
我完全按照指南(在我的情況下更改 IP 地址),畢竟這一切都在 Postgres 伺服器 1 上工作
但是當談到 Postgres server 2
patroni.yml
setup該指南說在兩個 Postgres 伺服器上具有相同的
patroni.yml
設置,但是當重新啟動patroni service
這個問題發生在 server1
quanlm@DB1:~$ sudo service patroni status ● patroni.service - Runners to orchestrate a high-availability PostgreSQL Loaded: loaded (/etc/systemd/system/patroni.service; disabled; vendor preset: enabled) Active: active (running) since Tue 2019-11-12 07:35:33 UTC; 14min ago Main PID: 411 (patroni) Tasks: 12 Memory: 77.6M CPU: 4.041s CGroup: /system.slice/patroni.service ├─411 /usr/bin/python3 /usr/local/bin/patroni /etc/patroni.yml ├─431 postgres -D /data/patroni --config-file=/data/patroni/postgresql.conf --listen_addresses=192.168.122.77 --max_prepared_tran ├─435 postgres: postgres: checkpointer process ├─436 postgres: postgres: writer process ├─439 postgres: postgres: stats collector process ├─447 postgres: postgres: postgres postgres 192.168.122.77(49984) idle ├─455 postgres: postgres: wal writer process └─456 postgres: postgres: autovacuum launcher process Nov 12 07:49:28 DB1 patroni[411]: 2019-11-12 07:49:28,533 INFO: no action. i am the leader with the lock Nov 12 07:49:38 DB1 patroni[411]: 2019-11-12 07:49:38,459 INFO: Lock owner: postgresql0; I am postgresql0 Nov 12 07:49:38 DB1 patroni[411]: 2019-11-12 07:49:38,536 INFO: no action. i am the leader with the lock Nov 12 07:49:48 DB1 patroni[411]: 2019-11-12 07:49:48,459 INFO: Lock owner: postgresql0; I am postgresql0 Nov 12 07:49:48 DB1 patroni[411]: 2019-11-12 07:49:48,544 INFO: no action. i am the leader with the lock Nov 12 07:49:58 DB1 patroni[411]: 2019-11-12 07:49:58,458 INFO: Lock owner: postgresql0; I am postgresql0 Nov 12 07:49:58 DB1 patroni[411]: 2019-11-12 07:49:58,548 INFO: no action. i am the leader with the lock Nov 12 07:50:08 DB1 patroni[411]: 2019-11-12 07:50:08,457 INFO: Lock owner: postgresql0; I am postgresql0 Nov 12 07:50:08 DB1 patroni[411]: 2019-11-12 07:50:08,539 INFO: no action. i am the leader with the lock Nov 12 07:50:19 DB1 patroni[411]: 2019-11-12 07:50:19,949 INFO: acquired session lock as a leader
是的,伺服器 1 很好,但在伺服器 2 上
quanlm@DB2:~$ sudo service patroni status ● patroni.service - Runners to orchestrate a high-availability PostgreSQL Loaded: loaded (/etc/systemd/system/patroni.service; disabled; vendor preset: enabled) Active: failed (Result: exit-code) since Tue 2019-11-12 07:50:02 UTC; 2s ago Process: 9514 ExecStart=/usr/local/bin/patroni /etc/patroni.yml (code=exited, status=1/FAILURE) Main PID: 9514 (code=exited, status=1/FAILURE) Nov 12 07:50:02 DB2 patroni[9514]: File "/usr/lib/python3.5/socketserver.py", line 440, in __init__ Nov 12 07:50:02 DB2 patroni[9514]: self.server_bind() Nov 12 07:50:02 DB2 patroni[9514]: File "/usr/lib/python3.5/http/server.py", line 138, in server_bind Nov 12 07:50:02 DB2 patroni[9514]: socketserver.TCPServer.server_bind(self) Nov 12 07:50:02 DB2 patroni[9514]: File "/usr/lib/python3.5/socketserver.py", line 454, in server_bind Nov 12 07:50:02 DB2 patroni[9514]: self.socket.bind(self.server_address) Nov 12 07:50:02 DB2 patroni[9514]: OSError: [Errno 99] Cannot assign requested address Nov 12 07:50:02 DB2 systemd[1]: patroni.service: Main process exited, code=exited, status=1/FAILURE Nov 12 07:50:02 DB2 systemd[1]: patroni.service: Unit entered failed state. Nov 12 07:50:02 DB2 systemd[1]: patroni.service: Failed with result 'exit-code'.
畢竟它不起作用。
我允許通過編輯和
listen_addresses = '*'
遠端連接兩個伺服器postgresql.conf
host all all 0.0.0.0/0 md5
在
pg_hba.conf
所以當HAproxy上班的時候,第1台宕機的時候,第2台伺服器沒有上去。
問題肯定在
patroni
伺服器 2 上,但如何解決呢?否則,有什麼辦法可以繞過 HA postgresql 伺服器?
P/s:防火牆設置
quanlm@DB1:~$ sudo ufw status Status: inactive quanlm@DB1:~$ sudo iptables -L Chain INPUT (policy ACCEPT) target prot opt source destination Chain FORWARD (policy ACCEPT) target prot opt source destination Chain OUTPUT (policy ACCEPT) target prot opt source destination
quanlm@DB2:~$ sudo ufw status Status: inactive quanlm@DB2:~$ sudo iptables -L Chain INPUT (policy ACCEPT) target prot opt source destination Chain FORWARD (policy ACCEPT) target prot opt source destination Chain OUTPUT (policy ACCEPT) target prot opt source destination
P/ss: 我的
patroni.yml
quanlm@DB1:~$ cat /etc/patroni.yml scope: postgres namespace: /db/ name: postgresql0 restapi: listen: 192.168.122.77:8008 connect_address: 192.168.122.77:8008 etcd: host: 192.168.122.156:2379 bootstrap: dcs: ttl: 30 loop_wait: 10 retry_timeout: 10 maximum_lag_on_failover: 1048576 postgresql: use_pg_rewind: true initdb: - encoding: UTF8 - data-checksums pg_hba: - host replication replicator 127.0.0.1/32 md5 - host replication replicator 192.168.122.77/0 md5 - host replication replicator 192.168.122.240/0 md5 - host all all 0.0.0.0/0 md5 users: admin: password: admin options: - createrole - createdb postgresql: listen: 192.168.122.77:5432 connect_address: 192.168.122.77:5432 data_dir: /data/patroni pgpass: /tmp/pgpass authentication: replication: username: replicator password: password superuser: username: postgres password: password parameters: unix_socket_directories: '.' tags: nofailover: false noloadbalance: false clonefrom: false nosync: false
是的,在兩台伺服器上
更新#1
我
patroni.yml
有零錢name: postgresql0
->name: postgresqp1
其餘 api 設置為主機 IP ‘192.168.122.240’
但是一個
postgresql: listen: 192.168.122.77:5432 connect_address: 192.168.122.77:5432
發生了這個問題:
quanlm@DB2:~⟫ sudo service patroni status ● patroni.service - Runners to orchestrate a high-availability PostgreSQL Loaded: loaded (/etc/systemd/system/patroni.service; disabled; vendor preset: enabled) Active: active (running) since Wed 2019-11-13 02:17:16 UTC; 25s ago Main PID: 32363 (patroni) Tasks: 6 Memory: 45.7M CPU: 6.326s CGroup: /system.slice/patroni.service ├─ 1014 postgres -D /data/patroni --config-file=/data/patroni/postgresql.conf --port=5432 --wal_level=hot_standby --max_wal_senders=10 --cluster └─32363 /usr/bin/python3 /usr/local/bin/patroni /etc/patroni.yml Nov 13 02:17:39 DB2 patroni[32363]: 192.168.122.77:5432 - accepting connections Nov 13 02:17:39 DB2 patroni[32363]: 192.168.122.77:5432 - accepting connections Nov 13 02:17:39 DB2 patroni[32363]: 2019-11-13 02:17:39,940 INFO: Lock owner: postgresql0; I am postgresql1 Nov 13 02:17:39 DB2 patroni[32363]: 2019-11-13 02:17:39,940 INFO: does not have lock Nov 13 02:17:39 DB2 patroni[32363]: 2019-11-13 02:17:39,940 INFO: establishing a new patroni connection to the postgres cluster Nov 13 02:17:40 DB2 patroni[32363]: LOG: could not bind IPv4 socket: Cannot assign requested address Nov 13 02:17:40 DB2 patroni[32363]: HINT: Is another postmaster already running on port 5432? If not, wait a few seconds and retry. Nov 13 02:17:40 DB2 patroni[32363]: WARNING: could not create listen socket for "192.168.122.77" Nov 13 02:17:40 DB2 patroni[32363]: FATAL: could not create any TCP/IP sockets Nov 13 02:17:40 DB2 patroni[32363]: 2019-11-13 02:17:40,042 INFO: demoting self because i do not have the lock and i was a leader
如果我改為
postgresql: listen: 192.168.122.240:5432 connect_address: 192.168.122.240:5432
這發生過:
quanlm@DB2:~⟫ sudo service patroni status ● patroni.service - Runners to orchestrate a high-availability PostgreSQL Loaded: loaded (/etc/systemd/system/patroni.service; disabled; vendor preset: enabled) Active: active (running) since Wed 2019-11-13 02:18:57 UTC; 40s ago Main PID: 3785 (patroni) Tasks: 11 Memory: 59.6M CPU: 770ms CGroup: /system.slice/patroni.service ├─3785 /usr/bin/python3 /usr/local/bin/patroni /etc/patroni.yml ├─3818 postgres -D /data/patroni --config-file=/data/patroni/postgresql.conf --max_replication_slots=10 --port=5432 --max_connections=100 --max_ ├─3853 postgres: postgres: startup process recovering 000000040000000000000006 ├─3857 postgres: postgres: checkpointer process ├─3858 postgres: postgres: writer process ├─3859 postgres: postgres: stats collector process └─3916 postgres: postgres: postgres postgres 192.168.122.240(39576) idle Nov 13 02:19:19 DB2 patroni[3785]: Nov 13 02:19:24 DB2 patroni[3785]: FATAL: could not start WAL streaming: ERROR: replication slot "postgresql1" does not exist Nov 13 02:19:24 DB2 patroni[3785]: Nov 13 02:19:27 DB2 patroni[3785]: 2019-11-13 02:19:27,938 INFO: Lock owner: postgresql0; I am postgresql1 Nov 13 02:19:27 DB2 patroni[3785]: 2019-11-13 02:19:27,938 INFO: does not have lock Nov 13 02:19:27 DB2 patroni[3785]: 2019-11-13 02:19:27,966 INFO: no action. i am a secondary and i am following a leader Nov 13 02:19:29 DB2 patroni[3785]: FATAL: could not start WAL streaming: ERROR: replication slot "postgresql1" does not exist Nov 13 02:19:29 DB2 patroni[3785]: Nov 13 02:19:34 DB2 patroni[3785]: FATAL: could not start WAL streaming: ERROR: replication slot "postgresql1" does not exist Nov 13 02:19:34 DB2 patroni[3785]:
更新#2
設置
patroni.yml
回name: postgresql0
並更改後postgresql: listen: 192.168.122.240:5432 connect_address: 192.168.122.240:5432
重置服務後,兩個數據庫現在都啟動了….我認為在為 HA 目的設置主動-被動伺服器時不會像那樣…而且他們沒有互相複製
您
patroni.yml
對第一個節點(名為postgresql0
)很好,但您必須將第二個節點的名稱更改為(例如)postgresql1
。此外,選擇另一個節點的 IP(列在該教程的頂部)並使用該節點更新 YAML(在restapi
和下出現了很多次postgresql
)。如果有疑問,Patroni 儲存庫中的範例文件(https://github.com/zalando/patroni/blob/master/postgres0.yml和其他兩個)總是包含有效的值。
更新 #1 中的最後一個日誌片段是正常行為,它顯示一個實例已經在跟隨另一個實例。複製槽錯誤通常會彈出,然後很快就會消失。
每個節點的名稱在您配置的每個節點中必須是唯一的。
這是詳細的分步教程,您可以按照使用 Patroni 和 Haproxy 設置高可用性 postgrsql 集群。