Postgresql

當查詢開始在備用伺服器上執行時,重播延遲開始增加

  • November 2, 2021

我有兩個具有以下規格的伺服器:

  • 8 個 vCPU、32768 MB RAM、640 GB SSD

主 Postgres 13.3 數據庫 (db1) 安裝在第一台伺服器 (Ubuntu 16.04.7) 上,配置如下:

shared_buffers = 16GB 
work_mem = 128MB
maintenance_work_mem = 8GB
effective_cache_size = 16GB

effective_io_concurrency = 400
max_worker_processes = 8
max_parallel_workers_per_gather = 4
max_parallel_workers = 8

wal_level = logical
synchronous_commit = on
max_wal_size = 4GB
min_wal_size = 32MB
wal_keep_size = 16384
wal_sender_timeout = 60s
checkpoint_completion_target = 0.7

synchronous_standby_names = 'FIRST 1 (db2_slave)'
max_standby_archive_delay = 1800s
max_standby_streaming_delay = 1800s

備用數據庫是安裝在第二台伺服器(Ubuntu 20.04.3)上的 Postgres 13.4 數據庫(db2),配置如下:

shared_buffers = 24GB
work_mem = 128MB
maintenance_work_mem = 16GB
effective_cache_size = 24GB

effective_io_concurrency = 400
max_worker_processes = 8
max_parallel_workers_per_gather = 4
max_parallel_workers = 8

wal_level = logical
synchronous_commit = on
max_wal_size = 4GB
min_wal_size = 32MB
checkpoint_completion_target = 0.7

primary_conninfo = 'host=... port=5432 user=repluser passfile=''...'' application_name=db2_slave'
primary_slot_name = 'db2'
hot_standby = on
max_standby_archive_delay = 1800s
max_standby_streaming_delay = 1800s

如果我在備用伺服器上執行 iotop -u postgresql ,我會看到兩個程序:

2229172 postgres: 13/main: walreceiver streaming DDFD/8E9FE9E0
2229138 postgres: 13/main: startup recovering 000000010000DDFD0000008E

在我在待機SELECT COUNT(*) FROM big_table;

2229138 postgres: 13/main: startup recovering 000000010000DE0400000017 waiting

我在master上執行了這個查詢:

SELECT client_addr                                                       as client,
      usename                                                           as user,
      application_name                                                  as name,
      state,
      sync_state                                                        as mode,
      pg_size_pretty(pg_wal_lsn_diff(pg_current_wal_lsn(), sent_lsn))   as pending,
      pg_size_pretty(pg_wal_lsn_diff(sent_lsn, write_lsn))              as write,
      pg_size_pretty(pg_wal_lsn_diff(write_lsn, flush_lsn))             as flush,
      pg_size_pretty(pg_wal_lsn_diff(flush_lsn, replay_lsn))            as replay,
      pg_size_pretty(pg_wal_lsn_diff(pg_current_wal_lsn(), replay_lsn)) as total_lag
FROM pg_stat_replication;

輸出是:

  client    |   user   |   name    |   state   | mode | pending |  write  |  flush  | replay | total_lag 
-------------+----------+-----------+-----------+------+---------+---------+---------+--------+-----------
...         | repluser | db2_slave | streaming | sync | 0 bytes | 0 bytes | 0 bytes | 21 MB  | 21 MB
(1 row)

如果我多次執行此請求,則在執行此查詢 ( SELECT COUNT(*) FROM big_table) 期間,重播和總延遲會一直增加。因此,我想知道問題的答案:

  1. 為什麼在執行副本的分析查詢期間重播延遲不斷增加?
  2. 為什麼我一開始對備用伺服器的請求,恢復程序就處於“等待”狀態?

原因是恢復衝突。數據庫可以延遲複製或取消衝突查詢。這由max_standby_streaming_delay備用數據庫上的參數控制,該參數確定 PostgreSQL 在取消有問題的查詢之前準備好延遲重播複製資訊的時間。

hot_standby_feedback您可以通過在備用伺服器上設置為來減少衝突的數量on(冒著使主伺服器上的表膨脹的風險),但這並不能完全解決問題。

本質上,您將無法在備用數據庫上執行不間斷的查詢,並且同時沒有複製延遲。

有關詳細資訊,請參閱我關於該主題的文章。

引用自:https://dba.stackexchange.com/questions/301925