Postgresql

與 Patroni 一起使用時,PostgreSQL 程序陷入“分叉的新後端”循環

  • November 3, 2021

我有一個 3 節點的 Patroni2.1.1 + PGSQL13.4 集群,其中一個節點在完成基本備份引導後表現得很奇怪。當通過相同的過程時,我有另一個相同的節點工作正常。

基本上,一旦 Patroni 嘗試連接到 localhost:5432,伺服器似乎就當機並重新分叉(據我所知)。您將在“循環”下方看到從connection received: host=::1 port=XXXXX.

2021-11-01 12:01:43.395 EDT [242925] LOG:  starting PostgreSQL 13.4 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 8.4.1 20200928 (Red Hat 8.4.1-1), 64-bit
2021-11-01 12:01:43.396 EDT [242925] LOG:  listening on IPv6 address "::1", port 5432
2021-11-01 12:01:43.396 EDT [242925] LOG:  listening on IPv4 address "127.0.0.1", port 5432
2021-11-01 12:01:43.397 EDT [242925] LOG:  listening on IPv4 address "192.168.23.13", port 5432
2021-11-01 12:01:43.398 EDT [242925] LOG:  listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432"
2021-11-01 12:01:43.401 EDT [242925] LOG:  listening on Unix socket "/tmp/.s.PGSQL.5432"
2021-11-01 12:01:43.405 EDT [242928] LOG:  database system was interrupted; last known up at 2021-11-01 12:01:25 EDT
2021-11-01 12:01:43.405 EDT [242928] DEBUG:  removing all temporary WAL segments
2021-11-01 12:01:43.436 EDT [242928] DEBUG:  executing restore command "cp /var/lib/pgsql/13/archive/00000077.history pg_wal/RECOVERYHISTORY"
cp: cannot stat '/var/lib/pgsql/13/archive/00000077.history': No such file or directory
2021-11-01 12:01:43.440 EDT [242928] DEBUG:  could not restore file "00000077.history" from archive: child process exited with exit code 1
2021-11-01 12:01:43.441 EDT [242928] LOG:  entering standby mode
2021-11-01 12:01:43.441 EDT [242928] DEBUG:  backup time 2021-11-01 12:01:25 EDT in file "backup_label"
2021-11-01 12:01:43.441 EDT [242928] DEBUG:  backup label pg_basebackup base backup in file "backup_label"
2021-11-01 12:01:43.441 EDT [242928] DEBUG:  backup timeline 118 in file "backup_label"
2021-11-01 12:01:43.441 EDT [242928] DEBUG:  executing restore command "cp /var/lib/pgsql/13/archive/00000076.history pg_wal/RECOVERYHISTORY"
2021-11-01 12:01:43.538 EDT [242928] LOG:  restored log file "00000076.history" from archive
2021-11-01 12:01:43.540 EDT [242928] DEBUG:  executing restore command "cp /var/lib/pgsql/13/archive/000000760000000100000000 pg_wal/RECOVERYXLOG"
2021-11-01 12:01:43.689 EDT [242928] LOG:  restored log file "000000760000000100000000" from archive
2021-11-01 12:01:43.692 EDT [242928] DEBUG:  got WAL segment from archive
2021-11-01 12:01:43.693 EDT [242928] DEBUG:  checkpoint record is at 1/60
2021-11-01 12:01:43.693 EDT [242928] DEBUG:  redo record is at 1/28; shutdown false
2021-11-01 12:01:43.693 EDT [242928] DEBUG:  next transaction ID: 531; next OID: 24606
2021-11-01 12:01:43.693 EDT [242928] DEBUG:  next MultiXactId: 1; next MultiXactOffset: 0
2021-11-01 12:01:43.693 EDT [242928] DEBUG:  oldest unfrozen transaction ID: 478, in database 1
2021-11-01 12:01:43.693 EDT [242928] DEBUG:  oldest MultiXactId: 1, in database 1
2021-11-01 12:01:43.693 EDT [242928] DEBUG:  commit timestamp Xid oldest/newest: 0/0
2021-11-01 12:01:43.693 EDT [242928] DEBUG:  transaction ID wrap limit is 2147484125, limited by database with OID 1
2021-11-01 12:01:43.693 EDT [242928] DEBUG:  MultiXactId wrap limit is 2147483648, limited by database with OID 1
2021-11-01 12:01:43.693 EDT [242928] DEBUG:  starting up replication slots
2021-11-01 12:01:43.693 EDT [242928] DEBUG:  starting up replication origin progress state
2021-11-01 12:01:43.697 EDT [242928] DEBUG:  resetting unlogged relations: cleanup 1 init 0
2021-11-01 12:01:43.698 EDT [242928] DEBUG:  initializing for hot standby
2021-11-01 12:01:43.698 EDT [242928] DEBUG:  my backend ID is 1
2021-11-01 12:01:43.698 EDT [242928] LOG:  redo starts at 1/28
2021-11-01 12:01:43.698 EDT [242928] DEBUG:  prune KnownAssignedXids to 531
2021-11-01 12:01:43.698 EDT [242928] CONTEXT:  WAL redo at 1/28 for Standby/RUNNING_XACTS: nextXid 531 latestCompletedXid 530 oldestRunningXid 531
2021-11-01 12:01:43.698 EDT [242928] DEBUG:  0 KnownAssignedXids (num=0 tail=0 head=0) 
2021-11-01 12:01:43.698 EDT [242928] CONTEXT:  WAL redo at 1/28 for Standby/RUNNING_XACTS: nextXid 531 latestCompletedXid 530 oldestRunningXid 531
2021-11-01 12:01:43.698 EDT [242928] DEBUG:  recovery snapshots are now enabled
2021-11-01 12:01:43.698 EDT [242928] CONTEXT:  WAL redo at 1/28 for Standby/RUNNING_XACTS: nextXid 531 latestCompletedXid 530 oldestRunningXid 531
2021-11-01 12:01:43.698 EDT [242928] DEBUG:  prune KnownAssignedXids to 531
2021-11-01 12:01:43.698 EDT [242928] CONTEXT:  WAL redo at 1/D8 for Standby/RUNNING_XACTS: nextXid 531 latestCompletedXid 530 oldestRunningXid 531
2021-11-01 12:01:43.698 EDT [242928] DEBUG:  end of backup reached
2021-11-01 12:01:43.698 EDT [242928] CONTEXT:  WAL redo at 1/110 for XLOG/BACKUP_END: 1/28
2021-11-01 12:01:43.699 EDT [242932] DEBUG:  checkpointer updated shared memory configuration values
2021-11-01 12:01:43.700 EDT [242928] LOG:  consistent recovery state reached at 1/138
2021-11-01 12:01:43.700 EDT [242928] DEBUG:  executing restore command "cp /var/lib/pgsql/13/archive/000000760000000100000001 pg_wal/RECOVERYXLOG"
2021-11-01 12:01:43.700 EDT [242925] LOG:  database system is ready to accept read only connections
cp: cannot stat '/var/lib/pgsql/13/archive/000000760000000100000001': No such file or directory
2021-11-01 12:01:43.704 EDT [242928] DEBUG:  could not restore file "000000760000000100000001" from archive: child process exited with exit code 1
2021-11-01 12:01:43.704 EDT [242928] DEBUG:  could not open file "pg_wal/000000760000000100000001": No such file or directory
2021-11-01 12:01:43.704 EDT [242928] DEBUG:  switched WAL source from archive to stream after failure
2021-11-01 12:01:43.705 EDT [242936] DEBUG:  find_in_dynamic_libpath: trying "/usr/pgsql-13/lib/libpqwalreceiver"
2021-11-01 12:01:43.706 EDT [242936] DEBUG:  find_in_dynamic_libpath: trying "/usr/pgsql-13/lib/libpqwalreceiver.so"
2021-11-01 12:01:44.370 EDT [242925] DEBUG:  forked new backend, pid=242938 socket=12
2021-11-01 12:01:44.371 EDT [242938] LOG:  connection received: host=::1 port=36714
2021-11-01 12:01:51.235 EDT [242938] DEBUG:  shmem_exit(0): 0 before_shmem_exit callbacks to make
2021-11-01 12:01:51.235 EDT [242938] DEBUG:  shmem_exit(0): 0 on_shmem_exit callbacks to make
2021-11-01 12:01:51.235 EDT [242938] DEBUG:  proc_exit(0): 1 callbacks to make
2021-11-01 12:01:51.235 EDT [242938] DEBUG:  exit(0)
2021-11-01 12:01:51.235 EDT [242938] DEBUG:  shmem_exit(-1): 0 before_shmem_exit callbacks to make
2021-11-01 12:01:51.235 EDT [242938] DEBUG:  shmem_exit(-1): 0 on_shmem_exit callbacks to make
2021-11-01 12:01:51.235 EDT [242938] DEBUG:  proc_exit(-1): 0 callbacks to make
2021-11-01 12:01:51.236 EDT [242925] DEBUG:  forked new backend, pid=242941 socket=12
2021-11-01 12:01:51.237 EDT [242941] LOG:  connection received: host=127.0.0.1 port=43428
2021-11-01 12:01:51.237 EDT [242925] DEBUG:  reaping dead processes
2021-11-01 12:01:51.237 EDT [242925] DEBUG:  server process (PID 242938) exited with exit code 0
2021-11-01 12:01:58.284 EDT [242941] DEBUG:  shmem_exit(0): 0 before_shmem_exit callbacks to make
2021-11-01 12:01:58.284 EDT [242941] DEBUG:  shmem_exit(0): 0 on_shmem_exit callbacks to make
2021-11-01 12:01:58.284 EDT [242941] DEBUG:  proc_exit(0): 1 callbacks to make
2021-11-01 12:01:58.284 EDT [242941] DEBUG:  exit(0)
2021-11-01 12:01:58.284 EDT [242941] DEBUG:  shmem_exit(-1): 0 before_shmem_exit callbacks to make
2021-11-01 12:01:58.284 EDT [242941] DEBUG:  shmem_exit(-1): 0 on_shmem_exit callbacks to make
2021-11-01 12:01:58.284 EDT [242941] DEBUG:  proc_exit(-1): 0 callbacks to make
2021-11-01 12:01:58.285 EDT [242925] DEBUG:  reaping dead processes
2021-11-01 12:01:58.285 EDT [242925] DEBUG:  server process (PID 242941) exited with exit code 0
2021-11-01 12:01:59.295 EDT [242925] DEBUG:  forked new backend, pid=242944 socket=12
2021-11-01 12:01:59.295 EDT [242944] LOG:  connection received: host=::1 port=36758
2021-11-01 12:02:06.301 EDT [242944] DEBUG:  shmem_exit(0): 0 before_shmem_exit callbacks to make
2021-11-01 12:02:06.301 EDT [242944] DEBUG:  shmem_exit(0): 0 on_shmem_exit callbacks to make
2021-11-01 12:02:06.301 EDT [242944] DEBUG:  proc_exit(0): 1 callbacks to make
2021-11-01 12:02:06.301 EDT [242944] DEBUG:  exit(0)
2021-11-01 12:02:06.301 EDT [242944] DEBUG:  shmem_exit(-1): 0 before_shmem_exit callbacks to make
2021-11-01 12:02:06.301 EDT [242944] DEBUG:  shmem_exit(-1): 0 on_shmem_exit callbacks to make
2021-11-01 12:02:06.301 EDT [242944] DEBUG:  proc_exit(-1): 0 callbacks to make
2021-11-01 12:02:06.301 EDT [242925] DEBUG:  forked new backend, pid=242948 socket=12
2021-11-01 12:02:06.302 EDT [242925] DEBUG:  reaping dead processes
2021-11-01 12:02:06.302 EDT [242925] DEBUG:  server process (PID 242944) exited with exit code 0
2021-11-01 12:02:06.302 EDT [242948] LOG:  connection received: host=127.0.0.1 port=43470
2021-11-01 12:02:13.440 EDT [242948] DEBUG:  shmem_exit(0): 0 before_shmem_exit callbacks to make
2021-11-01 12:02:13.440 EDT [242948] DEBUG:  shmem_exit(0): 0 on_shmem_exit callbacks to make
2021-11-01 12:02:13.440 EDT [242948] DEBUG:  proc_exit(0): 1 callbacks to make
2021-11-01 12:02:13.440 EDT [242948] DEBUG:  exit(0)
2021-11-01 12:02:13.440 EDT [242948] DEBUG:  shmem_exit(-1): 0 before_shmem_exit callbacks to make
2021-11-01 12:02:13.440 EDT [242948] DEBUG:  shmem_exit(-1): 0 on_shmem_exit callbacks to make
2021-11-01 12:02:13.440 EDT [242948] DEBUG:  proc_exit(-1): 0 callbacks to make
2021-11-01 12:02:13.441 EDT [242925] DEBUG:  reaping dead processes
2021-11-01 12:02:13.441 EDT [242925] DEBUG:  server process (PID 242948) exited with exit code 0
2021-11-01 12:02:14.148 EDT [242925] DEBUG:  postmaster received signal 2
2021-11-01 12:02:14.148 EDT [242925] LOG:  received fast shutdown request
2021-11-01 12:02:14.150 EDT [242925] LOG:  aborting any active transactions
2021-11-01 12:02:14.150 EDT [242933] DEBUG:  shmem_exit(0): 1 before_shmem_exit callbacks to make
2021-11-01 12:02:14.150 EDT [242933] DEBUG:  shmem_exit(0): 5 on_shmem_exit callbacks to make
2021-11-01 12:02:14.150 EDT [242933] DEBUG:  proc_exit(0): 2 callbacks to make
2021-11-01 12:02:14.150 EDT [242933] DEBUG:  exit(0)
2021-11-01 12:02:14.150 EDT [242933] DEBUG:  shmem_exit(-1): 0 before_shmem_exit callbacks to make
2021-11-01 12:02:14.150 EDT [242933] DEBUG:  shmem_exit(-1): 0 on_shmem_exit callbacks to make
2021-11-01 12:02:14.150 EDT [242933] DEBUG:  proc_exit(-1): 0 callbacks to make
2021-11-01 12:02:14.150 EDT [242928] DEBUG:  shmem_exit(1): 1 before_shmem_exit callbacks to make
2021-11-01 12:02:14.150 EDT [242928] DEBUG:  shmem_exit(1): 7 on_shmem_exit callbacks to make
2021-11-01 12:02:14.150 EDT [242928] DEBUG:  removing all KnownAssignedXids
2021-11-01 12:02:14.150 EDT [242928] DEBUG:  release all standby locks
2021-11-01 12:02:14.150 EDT [242928] DEBUG:  proc_exit(1): 2 callbacks to make
2021-11-01 12:02:14.150 EDT [242928] DEBUG:  exit(1)
2021-11-01 12:02:14.150 EDT [242928] DEBUG:  shmem_exit(-1): 0 before_shmem_exit callbacks to make
2021-11-01 12:02:14.150 EDT [242928] DEBUG:  shmem_exit(-1): 0 on_shmem_exit callbacks to make
2021-11-01 12:02:14.150 EDT [242928] DEBUG:  proc_exit(-1): 0 callbacks to make
2021-11-01 12:02:14.151 EDT [242925] DEBUG:  reaping dead processes
2021-11-01 12:02:14.151 EDT [242925] DEBUG:  reaping dead processes
2021-11-01 12:02:14.449 EDT [242925] DEBUG:  forked new backend, pid=242954 socket=12
2021-11-01 12:02:14.450 EDT [242954] LOG:  connection received: host=::1 port=36798
2021-11-01 12:02:15.915 EDT [242936] FATAL:  terminating walreceiver process due to administrator command
2021-11-01 12:02:15.915 EDT [242936] DEBUG:  shmem_exit(1): 1 before_shmem_exit callbacks to make
2021-11-01 12:02:15.915 EDT [242936] DEBUG:  shmem_exit(1): 6 on_shmem_exit callbacks to make
2021-11-01 12:02:15.915 EDT [242936] DEBUG:  proc_exit(1): 2 callbacks to make
2021-11-01 12:02:15.915 EDT [242936] DEBUG:  exit(1)
2021-11-01 12:02:15.916 EDT [242936] DEBUG:  shmem_exit(-1): 0 before_shmem_exit callbacks to make
2021-11-01 12:02:15.916 EDT [242936] DEBUG:  shmem_exit(-1): 0 on_shmem_exit callbacks to make
2021-11-01 12:02:15.916 EDT [242936] DEBUG:  proc_exit(-1): 0 callbacks to make
2021-11-01 12:02:15.917 EDT [242925] DEBUG:  reaping dead processes
2021-11-01 12:02:15.917 EDT [242932] LOG:  shutting down
2021-11-01 12:02:15.917 EDT [242932] DEBUG:  skipping restartpoint, already performed at 1/28
2021-11-01 12:02:15.919 EDT [242932] DEBUG:  updated min recovery point to 1/1000000 on timeline 118
2021-11-01 12:02:15.920 EDT [242932] DEBUG:  shmem_exit(0): 1 before_shmem_exit callbacks to make
2021-11-01 12:02:15.920 EDT [242932] DEBUG:  shmem_exit(0): 5 on_shmem_exit callbacks to make
2021-11-01 12:02:15.920 EDT [242932] DEBUG:  proc_exit(0): 2 callbacks to make
2021-11-01 12:02:15.920 EDT [242932] DEBUG:  exit(0)
2021-11-01 12:02:15.920 EDT [242932] DEBUG:  shmem_exit(-1): 0 before_shmem_exit callbacks to make
2021-11-01 12:02:15.920 EDT [242932] DEBUG:  shmem_exit(-1): 0 on_shmem_exit callbacks to make
2021-11-01 12:02:15.920 EDT [242932] DEBUG:  proc_exit(-1): 0 callbacks to make
2021-11-01 12:02:15.921 EDT [242925] DEBUG:  reaping dead processes
2021-11-01 12:02:15.921 EDT [242934] DEBUG:  writing stats file "pg_stat/global.stat"
2021-11-01 12:02:15.921 EDT [242934] DEBUG:  shmem_exit(-1): 0 before_shmem_exit callbacks to make
2021-11-01 12:02:15.921 EDT [242934] DEBUG:  shmem_exit(-1): 0 on_shmem_exit callbacks to make
2021-11-01 12:02:15.921 EDT [242934] DEBUG:  proc_exit(-1): 0 callbacks to make
2021-11-01 12:02:15.922 EDT [242925] DEBUG:  reaping dead processes
2021-11-01 12:02:17.890 EDT [242925] DEBUG:  postmaster received signal 2
2021-11-01 12:02:18.617 EDT [242954] DEBUG:  shmem_exit(0): 0 before_shmem_exit callbacks to make
2021-11-01 12:02:18.617 EDT [242954] DEBUG:  shmem_exit(0): 0 on_shmem_exit callbacks to make
2021-11-01 12:02:18.617 EDT [242954] DEBUG:  proc_exit(0): 1 callbacks to make
2021-11-01 12:02:18.617 EDT [242954] DEBUG:  exit(0)
2021-11-01 12:02:18.617 EDT [242954] DEBUG:  shmem_exit(-1): 0 before_shmem_exit callbacks to make
2021-11-01 12:02:18.617 EDT [242954] DEBUG:  shmem_exit(-1): 0 on_shmem_exit callbacks to make
2021-11-01 12:02:18.617 EDT [242954] DEBUG:  proc_exit(-1): 0 callbacks to make
2021-11-01 12:02:18.618 EDT [242925] DEBUG:  reaping dead processes
2021-11-01 12:02:18.618 EDT [242925] DEBUG:  server process (PID 242954) exited with exit code 0
2021-11-01 12:02:18.618 EDT [242925] DEBUG:  shmem_exit(0): 0 before_shmem_exit callbacks to make
2021-11-01 12:02:18.618 EDT [242925] DEBUG:  shmem_exit(0): 5 on_shmem_exit callbacks to make
2021-11-01 12:02:18.618 EDT [242925] DEBUG:  cleaning up dynamic shared memory control segment with ID 2104592565
2021-11-01 12:02:18.631 EDT [242925] DEBUG:  proc_exit(0): 2 callbacks to make
2021-11-01 12:02:18.631 EDT [242925] LOG:  database system is shut down
2021-11-01 12:02:18.631 EDT [242925] DEBUG:  exit(0)
2021-11-01 12:02:18.631 EDT [242925] DEBUG:  shmem_exit(-1): 0 before_shmem_exit callbacks to make
2021-11-01 12:02:18.631 EDT [242925] DEBUG:  shmem_exit(-1): 0 on_shmem_exit callbacks to make
2021-11-01 12:02:18.631 EDT [242925] DEBUG:  proc_exit(-1): 0 callbacks to make
2021-11-01 12:02:18.633 EDT [242927] DEBUG:  logger shutting down
2021-11-01 12:02:18.633 EDT [242927] DEBUG:  shmem_exit(0): 0 before_shmem_exit callbacks to make
2021-11-01 12:02:18.633 EDT [242927] DEBUG:  shmem_exit(0): 0 on_shmem_exit callbacks to make
2021-11-01 12:02:18.633 EDT [242927] DEBUG:  proc_exit(0): 0 callbacks to make
2021-11-01 12:02:18.633 EDT [242927] DEBUG:  exit(0)
2021-11-01 12:02:18.633 EDT [242927] DEBUG:  shmem_exit(-1): 0 before_shmem_exit callbacks to make
2021-11-01 12:02:18.633 EDT [242927] DEBUG:  shmem_exit(-1): 0 on_shmem_exit callbacks to make
2021-11-01 12:02:18.633 EDT [242927] DEBUG:  proc_exit(-1): 0 callbacks to make

如果我停止 Patroni,伺服器將開始正常執行並按預期加入複製。使用者 postgres 可以使用 psql 實用程序甚至使用 pgadmin 進行本地或遠端連接。

有人可以解釋這裡發生了什麼嗎?我仍在學習 PGSQL,但我對這些日誌還不太滿意……謝謝!

有關更多上下文,您還可以查看我發布的 Patroni 問題#2104 。

OK 終於找到原因了!在 strace 上使用 strace 我發現了以下循環:

388084 sendmsg(5, {msg_name=NULL, msg_namelen=0, msg_iov=[{iov_base="\0\0\0\35", iov_len=4}, {iov_base="\2\0\0\n26:94667\0\226\321\200\2732\253J\266\263\301\2\243\364\360\261\375", iov_len=29}], msg_iovlen=2, msg_controllen=0, msg_flags=0}, MSG_NOSIGNAL) = 33
388084 read(5, "\0\0\0\267", 4)         = 4
388084 read(5, "\0\0\0\0", 4)           = 4
388084 read(5, "\0\0\0\0\0\0\0\1\0\0\0\2\0\0\0\vDOMAIN.COM\0\0\0\10P"..., 183) = 183

原來,postgres 使用者的錯誤主體仍然被記憶體,並且在 kdestroy -A 重新開始工作之後。

引用自:https://dba.stackexchange.com/questions/301981