PostgreSQL 中的“共享記憶體不足”和鎖定監控
我正在嘗試監視使用 PostgreSQL (13) 的應用程序來確定可能導致此錯誤的原因(偶爾):
ERROR: out of shared memory Hint: You might need to increase max_pred_locks_per_transaction.
我已經增加了
max_pred_locks_per_transaction
(和max_locks_per_transaction
),但我試圖在應用程序本身中找到潛在的原因,看看是否可以做一些更好的事情。當我監控鎖時,有時
SIReadLock
似乎執行了相當長的時間,都掛了COMMIT
(儘管它們最終確實完成了)。執行的許多查詢可能涉及插入/更新後觸發器。有沒有辦法準確監控哪些查詢正在創建這些鎖(而不是只看到
COMMIT
最後)?(是否有一般推薦的方法來調查這類問題?)這是監控鎖的查詢:
SELECT pg_database.datname AS database, pg_namespace.nspname AS schema, pg_class.relname AS table, pg_locks.mode AS lock_mode, pg_locks.page AS lock_page, pg_locks.tuple AS lock_tuple, pg_locks.locktype AS lock_type, pg_locks.virtualxid AS virtual_xid, pg_locks.virtualtransaction AS virtual_transaction, pg_locks.transactionid AS transaction_id, AGE(now(), pg_stat_activity.query_start) AS time_running, pg_stat_activity.query_start AS time_started, pg_stat_activity.query FROM pg_class INNER JOIN pg_locks ON pg_locks.relation = pg_class.oid INNER JOIN pg_database ON pg_database.oid = pg_locks.database INNER JOIN pg_namespace ON pg_namespace.oid = pg_class.relnamespace INNER JOIN pg_stat_activity ON pg_stat_activity.pid = pg_locks.pid WHERE pg_class.relkind = 'r' ORDER BY 11 DESC;
這是一個範例輸出(未顯示更多行):
table | lock_mode | lock_page | lock_tuple | lock_type | virtual_xid | virtual_transaction| transaction_id | time_running | time_started | query ---------------+-----------------+-----------+------------+-----------+-------------+--------------------+----------------+-----------------+-------------------------------+-------------- test_table_a | SIReadLock | 8410 | | page | | 25/298509 | | 00:45:46.929129 | 2021-09-06 12:13:31.525553+01 | COMMIT test_table_a | SIReadLock | 8416 | | page | | 25/295398 | | 00:45:46.929129 | 2021-09-06 12:13:31.525553+01 | COMMIT test_table_b | SIReadLock | | | relation | | 25/299949 | | 00:45:46.929129 | 2021-09-06 12:13:31.525553+01 | COMMIT test_table_a | SIReadLock | 8411 | | page | | 25/296128 | | 00:45:46.929129 | 2021-09-06 12:13:31.525553+01 | COMMIT test_table_c | SIReadLock | | | relation | | 25/297437 | | 00:45:46.929129 | 2021-09-06 12:13:31.525553+01 | COMMIT test_table_a | SIReadLock | 8413 | | page | | 25/300961 | | 00:45:46.929129 | 2021-09-06 12:13:31.525553+01 | COMMIT test_table_a | SIReadLock | 8405 | | page | | 25/294361 | | 00:45:46.929129 | 2021-09-06 12:13:31.525553+01 | COMMIT test_table_a | SIReadLock | 8403 | 110 | tuple | | 25/300599 | | 00:45:46.929129 | 2021-09-06 12:13:31.525553+01 | COMMIT test_table_a | SIReadLock | 8419 | | page | | 25/298137 | | 00:45:46.929129 | 2021-09-06 12:13:31.525553+01 | COMMIT test_table_d | SIReadLock | | | relation | | 25/296086 | | 00:45:46.929129 | 2021-09-06 12:13:31.525553+01 | COMMIT test_table_a | SIReadLock | 8413 | | page | | 25/294361 | | 00:45:46.929129 | 2021-09-06 12:13:31.525553+01 | COMMIT test_table_a | SIReadLock | 8421 | | page | | 25/298137 | | 00:45:46.929129 | 2021-09-06 12:13:31.525553+01 | COMMIT test_table_a | SIReadLock | 8422 | | page | | 25/296692 | | 00:45:46.929129 | 2021-09-06 12:13:31.525553+01 | COMMIT test_table_b | SIReadLock | | | relation | | 25/300080 | | 00:45:46.929129 | 2021-09-06 12:13:31.525553+01 | COMMIT test_table_a | SIReadLock | 8405 | | page | | 25/299590 | | 00:45:46.929129 | 2021-09-06 12:13:31.525553+01 | COMMIT test_table_c | SIReadLock | | | relation | | 25/294222 | | 00:45:46.929129 | 2021-09-06 12:13:31.525553+01 | COMMIT test_table_b | SIReadLock | | | relation | | 25/299148 | | 00:45:46.929129 | 2021-09-06 12:13:31.525553+01 | COMMIT test_table_a | SIReadLock | 8407 | | page | | 25/296692 | | 00:45:46.929129 | 2021-09-06 12:13:31.525553+01 | COMMIT test_table_d | SIReadLock | | | relation | | 25/295640 | | 00:45:46.929129 | 2021-09-06 12:13:31.525553+01 | COMMIT test_table_a | SIReadLock | 8431 | | page | | 25/298779 | | 00:45:46.929129 | 2021-09-06 12:13:31.525553+01 | COMMIT test_table_a | SIReadLock | 8426 | | page | | 25/294976 | | 00:45:46.929129 | 2021-09-06 12:13:31.525553+01 | COMMIT test_table_a | SIReadLock | 8410 | | page | | 25/297792 | | 00:45:46.929129 | 2021-09-06 12:13:31.525553+01 | COMMIT test_table_a | SIReadLock | 8416 | | page | | 25/294361 | | 00:45:46.929129 | 2021-09-06 12:13:31.525553+01 | COMMIT test_table_a | SIReadLock | 8426 | | page | | 25/300585 | | 00:45:46.929129 | 2021-09-06 12:13:31.525553+01 | COMMIT test_table_a | SIReadLock | 8417 | 1 | tuple | | 25/297792 | | 00:45:46.929129 | 2021-09-06 12:13:31.525553+01 | COMMIT test_table_c | SIReadLock | | | relation | | 25/295787 | | 00:45:46.929129 | 2021-09-06 12:13:31.525553+01 | COMMIT test_table_e | SIReadLock | | | relation | | 25/297764 | | 00:45:46.929129 | 2021-09-06 12:13:31.525553+01 | COMMIT test_table_a | SIReadLock | 8421 | | page | | 25/295893 | | 00:45:46.929129 | 2021-09-06 12:13:31.525553+01 | COMMIT test_table_a | SIReadLock | 8427 | | page | | 25/294719 | | 00:45:46.929129 | 2021-09-06 12:13:31.525553+01 | COMMIT test_table_d | SIReadLock | | | relation | | 25/297663 | | 00:45:46.929129 | 2021-09-06 12:13:31.525553+01 | COMMIT test_table_c | SIReadLock | | | relation | | 25/294139 | | 00:45:46.929129 | 2021-09-06 12:13:31.525553+01 | COMMIT test_table_a | SIReadLock | 8431 | | page | | 25/300947 | | 00:45:46.929129 | 2021-09-06 12:13:31.525553+01 | COMMIT test_table_d | SIReadLock | | | relation | | 25/295370 | | 00:45:46.929129 | 2021-09-06 12:13:31.525553+01 | COMMIT test_table_a | SIReadLock | 8422 | | page | | 25/297792 | | 00:45:46.929129 | 2021-09-06 12:13:31.525553+01 | COMMIT test_table_c | SIReadLock | | | relation | | 25/297064 | | 00:45:46.929129 | 2021-09-06 12:13:31.525553+01 | COMMIT test_table_a | SIReadLock | 8410 | | page | | 25/299949 | | 00:45:46.929129 | 2021-09-06 12:13:31.525553+01 | COMMIT test_table_a | SIReadLock | 8415 | | page | | 25/296128 | | 00:45:46.929129 | 2021-09-06 12:13:31.525553+01 | COMMIT test_table_a | SIReadLock | 8428 | | page | | 25/300080 | | 00:45:46.929129 | 2021-09-06 12:13:31.525553+01 | COMMIT test_table_a | SIReadLock | 8431 | | page | | 25/297423 | | 00:45:46.929129 | 2021-09-06 12:13:31.525553+01 | COMMIT test_table_a | SIReadLock | 8417 | 1 | tuple | | 25/298509 | | 00:45:46.929129 | 2021-09-06 12:13:31.525553+01 | COMMIT test_table_a | SIReadLock | 8408 | | page | | 25/297064 | | 00:45:46.929129 | 2021-09-06 12:13:31.525553+01 | COMMIT test_table_a | SIReadLock | 8427 | | page | | 25/295384 | | 00:45:46.929129 | 2021-09-06 12:13:31.525553+01 | COMMIT test_table_a | SIReadLock | 8425 | | page | | 25/294236 | | 00:45:46.929129 | 2021-09-06 12:13:31.525553+01 | COMMIT test_table_a | SIReadLock | 8415 | | page | | 25/294361 | | 00:45:46.929129 | 2021-09-06 12:13:31.525553+01 | COMMIT
(到這個數據庫伺服器的連接通常少於 30 個,到這個數據庫的連接是 10 個。)
max_pred_locks_per_transaction
確定“謂詞鎖定目標雜湊表”的大小。這是與鎖表不同的資料結構,因此當鎖表未滿時,您可能會用完共享記憶體。
SIReadLock
事務送出後持有 s是正常的。您的查詢具有誤導性:那些COMMIT
s 沒有掛起,它們已經完成。
嘗試了一個星期後,我沒有找到任何好的解決方案。最終,我設法通過當場抓住它來確定導致此問題的潛在查詢。
增加鎖使用量通常與長時間執行的查詢或行為不合理的查詢相關聯。在某些 PSQL 版本中可能會發生這種情況,並且PSQL 版本存在錯誤,儘管這種情況很少見。
我有同樣的問題,
Hint: You might need to increase max_pred_locks_per_transaction.
我已經增加到max_pred_locks_per_transaction
2048,一個高得離譜的值。我如何追踪潛在問題:
- 監控 PostgreSQL 程序記憶體。我使用Datadog監視了執行 PSQL 的 Docker 容器。這個問題有點隨機觸發,我找不到監控每個連接或每個查詢鎖和記憶體使用的方法。特別是在這種情況下,因為有問題的查詢幾乎導致整個數據庫崩潰。
- 使用 pg_activity 監控活動查詢。您可以從任何 PSQL IDE 或
psql
cli 提示符手動執行此操作,但這pg_activity
會更容易。經過大量手動監控,我設法確定了導致超出鎖定的查詢,因此記憶體使用量增加。
- 修復查詢
就我而言,解決方案是首先擺脫查詢。您可以做的其他解決方案
- 如果此應用程序端查詢,請將其拆分為多個查詢並送出批次。確保您的應用程序僅在短時間內保持事務打開,並在更新數百或數千行後送出。不要讓單個事務觸及多行。
- 重新設計查詢,例如使用物化視圖