Postgresql
當查詢開始在備用伺服器上執行時,重播延遲開始增加
我有兩個具有以下規格的伺服器:
- 8 個 vCPU、32768 MB RAM、640 GB SSD
主 Postgres 13.3 數據庫 (db1) 安裝在第一台伺服器 (Ubuntu 16.04.7) 上,配置如下:
shared_buffers = 16GB work_mem = 128MB maintenance_work_mem = 8GB effective_cache_size = 16GB effective_io_concurrency = 400 max_worker_processes = 8 max_parallel_workers_per_gather = 4 max_parallel_workers = 8 wal_level = logical synchronous_commit = on max_wal_size = 4GB min_wal_size = 32MB wal_keep_size = 16384 wal_sender_timeout = 60s checkpoint_completion_target = 0.7 synchronous_standby_names = 'FIRST 1 (db2_slave)' max_standby_archive_delay = 1800s max_standby_streaming_delay = 1800s
備用數據庫是安裝在第二台伺服器(Ubuntu 20.04.3)上的 Postgres 13.4 數據庫(db2),配置如下:
shared_buffers = 24GB work_mem = 128MB maintenance_work_mem = 16GB effective_cache_size = 24GB effective_io_concurrency = 400 max_worker_processes = 8 max_parallel_workers_per_gather = 4 max_parallel_workers = 8 wal_level = logical synchronous_commit = on max_wal_size = 4GB min_wal_size = 32MB checkpoint_completion_target = 0.7 primary_conninfo = 'host=... port=5432 user=repluser passfile=''...'' application_name=db2_slave' primary_slot_name = 'db2' hot_standby = on max_standby_archive_delay = 1800s max_standby_streaming_delay = 1800s
如果我在備用伺服器上執行 iotop -u postgresql ,我會看到兩個程序:
2229172 postgres: 13/main: walreceiver streaming DDFD/8E9FE9E0 2229138 postgres: 13/main: startup recovering 000000010000DDFD0000008E
在我在待機(
SELECT COUNT(*) FROM big_table;
2229138 postgres: 13/main: startup recovering 000000010000DE0400000017 waiting
我在master上執行了這個查詢:
SELECT client_addr as client, usename as user, application_name as name, state, sync_state as mode, pg_size_pretty(pg_wal_lsn_diff(pg_current_wal_lsn(), sent_lsn)) as pending, pg_size_pretty(pg_wal_lsn_diff(sent_lsn, write_lsn)) as write, pg_size_pretty(pg_wal_lsn_diff(write_lsn, flush_lsn)) as flush, pg_size_pretty(pg_wal_lsn_diff(flush_lsn, replay_lsn)) as replay, pg_size_pretty(pg_wal_lsn_diff(pg_current_wal_lsn(), replay_lsn)) as total_lag FROM pg_stat_replication;
輸出是:
client | user | name | state | mode | pending | write | flush | replay | total_lag -------------+----------+-----------+-----------+------+---------+---------+---------+--------+----------- ... | repluser | db2_slave | streaming | sync | 0 bytes | 0 bytes | 0 bytes | 21 MB | 21 MB (1 row)
如果我多次執行此請求,則在執行此查詢 (
SELECT COUNT(*) FROM big_table
) 期間,重播和總延遲會一直增加。因此,我想知道問題的答案:
- 為什麼在執行副本的分析查詢期間重播延遲不斷增加?
- 為什麼我一開始對備用伺服器的請求,恢復程序就處於“等待”狀態?
原因是恢復衝突。數據庫可以延遲複製或取消衝突查詢。這由
max_standby_streaming_delay
備用數據庫上的參數控制,該參數確定 PostgreSQL 在取消有問題的查詢之前準備好延遲重播複製資訊的時間。
hot_standby_feedback
您可以通過在備用伺服器上設置為來減少衝突的數量on
(冒著使主伺服器上的表膨脹的風險),但這並不能完全解決問題。本質上,您將無法在備用數據庫上執行不間斷的查詢,並且同時沒有複製延遲。
有關詳細資訊,請參閱我關於該主題的文章。