Postgresql

PostgreSQL 中的“共享記憶體不足”和鎖定監控

  • December 5, 2021

我正在嘗試監視使用 PostgreSQL (13) 的應用程序來確定可能導致此錯誤的原因(偶爾):

ERROR: out of shared memory
Hint: You might need to increase max_pred_locks_per_transaction.

我已經增加了max_pred_locks_per_transaction(和max_locks_per_transaction),但我試圖在應用程序本身中找到潛在的原因,看看是否可以做一些更好的事情。

當我監控鎖時,有時SIReadLock似乎執行了相當長的時間,都掛了COMMIT(儘管它們最終確實完成了)。執行的許多查詢可能涉及插入/更新後觸發器。

有沒有辦法準確監控哪些查詢正在創建這些鎖(而不是只看到COMMIT最後)?(是否有一般推薦的方法來調查這類問題?)

這是監控鎖的查詢:

SELECT pg_database.datname AS database,
      pg_namespace.nspname AS schema,
      pg_class.relname AS table,
      pg_locks.mode AS lock_mode,
      pg_locks.page AS lock_page,
      pg_locks.tuple AS lock_tuple,
      pg_locks.locktype AS lock_type,
      pg_locks.virtualxid AS virtual_xid,
      pg_locks.virtualtransaction AS virtual_transaction,
      pg_locks.transactionid AS transaction_id,
      AGE(now(), pg_stat_activity.query_start) AS time_running,
      pg_stat_activity.query_start AS time_started,
      pg_stat_activity.query
FROM pg_class
   INNER JOIN pg_locks ON pg_locks.relation = pg_class.oid
   INNER JOIN pg_database ON pg_database.oid = pg_locks.database
   INNER JOIN pg_namespace ON pg_namespace.oid = pg_class.relnamespace
   INNER JOIN pg_stat_activity ON pg_stat_activity.pid = pg_locks.pid
WHERE pg_class.relkind = 'r'
   ORDER BY 11 DESC;

這是一個範例輸出(未顯示更多行):

  table       |    lock_mode    | lock_page | lock_tuple | lock_type | virtual_xid | virtual_transaction| transaction_id |  time_running   |         time_started          |    query
---------------+-----------------+-----------+------------+-----------+-------------+--------------------+----------------+-----------------+-------------------------------+--------------
test_table_a  | SIReadLock      |      8410 |            | page      |             | 25/298509          |                | 00:45:46.929129 | 2021-09-06 12:13:31.525553+01 | COMMIT
test_table_a  | SIReadLock      |      8416 |            | page      |             | 25/295398          |                | 00:45:46.929129 | 2021-09-06 12:13:31.525553+01 | COMMIT
test_table_b  | SIReadLock      |           |            | relation  |             | 25/299949          |                | 00:45:46.929129 | 2021-09-06 12:13:31.525553+01 | COMMIT
test_table_a  | SIReadLock      |      8411 |            | page      |             | 25/296128          |                | 00:45:46.929129 | 2021-09-06 12:13:31.525553+01 | COMMIT
test_table_c  | SIReadLock      |           |            | relation  |             | 25/297437          |                | 00:45:46.929129 | 2021-09-06 12:13:31.525553+01 | COMMIT
test_table_a  | SIReadLock      |      8413 |            | page      |             | 25/300961          |                | 00:45:46.929129 | 2021-09-06 12:13:31.525553+01 | COMMIT
test_table_a  | SIReadLock      |      8405 |            | page      |             | 25/294361          |                | 00:45:46.929129 | 2021-09-06 12:13:31.525553+01 | COMMIT
test_table_a  | SIReadLock      |      8403 |        110 | tuple     |             | 25/300599          |                | 00:45:46.929129 | 2021-09-06 12:13:31.525553+01 | COMMIT
test_table_a  | SIReadLock      |      8419 |            | page      |             | 25/298137          |                | 00:45:46.929129 | 2021-09-06 12:13:31.525553+01 | COMMIT
test_table_d  | SIReadLock      |           |            | relation  |             | 25/296086          |                | 00:45:46.929129 | 2021-09-06 12:13:31.525553+01 | COMMIT
test_table_a  | SIReadLock      |      8413 |            | page      |             | 25/294361          |                | 00:45:46.929129 | 2021-09-06 12:13:31.525553+01 | COMMIT
test_table_a  | SIReadLock      |      8421 |            | page      |             | 25/298137          |                | 00:45:46.929129 | 2021-09-06 12:13:31.525553+01 | COMMIT
test_table_a  | SIReadLock      |      8422 |            | page      |             | 25/296692          |                | 00:45:46.929129 | 2021-09-06 12:13:31.525553+01 | COMMIT
test_table_b  | SIReadLock      |           |            | relation  |             | 25/300080          |                | 00:45:46.929129 | 2021-09-06 12:13:31.525553+01 | COMMIT
test_table_a  | SIReadLock      |      8405 |            | page      |             | 25/299590          |                | 00:45:46.929129 | 2021-09-06 12:13:31.525553+01 | COMMIT
test_table_c  | SIReadLock      |           |            | relation  |             | 25/294222          |                | 00:45:46.929129 | 2021-09-06 12:13:31.525553+01 | COMMIT
test_table_b  | SIReadLock      |           |            | relation  |             | 25/299148          |                | 00:45:46.929129 | 2021-09-06 12:13:31.525553+01 | COMMIT
test_table_a  | SIReadLock      |      8407 |            | page      |             | 25/296692          |                | 00:45:46.929129 | 2021-09-06 12:13:31.525553+01 | COMMIT
test_table_d  | SIReadLock      |           |            | relation  |             | 25/295640          |                | 00:45:46.929129 | 2021-09-06 12:13:31.525553+01 | COMMIT
test_table_a  | SIReadLock      |      8431 |            | page      |             | 25/298779          |                | 00:45:46.929129 | 2021-09-06 12:13:31.525553+01 | COMMIT
test_table_a  | SIReadLock      |      8426 |            | page      |             | 25/294976          |                | 00:45:46.929129 | 2021-09-06 12:13:31.525553+01 | COMMIT
test_table_a  | SIReadLock      |      8410 |            | page      |             | 25/297792          |                | 00:45:46.929129 | 2021-09-06 12:13:31.525553+01 | COMMIT
test_table_a  | SIReadLock      |      8416 |            | page      |             | 25/294361          |                | 00:45:46.929129 | 2021-09-06 12:13:31.525553+01 | COMMIT
test_table_a  | SIReadLock      |      8426 |            | page      |             | 25/300585          |                | 00:45:46.929129 | 2021-09-06 12:13:31.525553+01 | COMMIT
test_table_a  | SIReadLock      |      8417 |          1 | tuple     |             | 25/297792          |                | 00:45:46.929129 | 2021-09-06 12:13:31.525553+01 | COMMIT
test_table_c  | SIReadLock      |           |            | relation  |             | 25/295787          |                | 00:45:46.929129 | 2021-09-06 12:13:31.525553+01 | COMMIT
test_table_e  | SIReadLock      |           |            | relation  |             | 25/297764          |                | 00:45:46.929129 | 2021-09-06 12:13:31.525553+01 | COMMIT
test_table_a  | SIReadLock      |      8421 |            | page      |             | 25/295893          |                | 00:45:46.929129 | 2021-09-06 12:13:31.525553+01 | COMMIT
test_table_a  | SIReadLock      |      8427 |            | page      |             | 25/294719          |                | 00:45:46.929129 | 2021-09-06 12:13:31.525553+01 | COMMIT
test_table_d  | SIReadLock      |           |            | relation  |             | 25/297663          |                | 00:45:46.929129 | 2021-09-06 12:13:31.525553+01 | COMMIT
test_table_c  | SIReadLock      |           |            | relation  |             | 25/294139          |                | 00:45:46.929129 | 2021-09-06 12:13:31.525553+01 | COMMIT
test_table_a  | SIReadLock      |      8431 |            | page      |             | 25/300947          |                | 00:45:46.929129 | 2021-09-06 12:13:31.525553+01 | COMMIT
test_table_d  | SIReadLock      |           |            | relation  |             | 25/295370          |                | 00:45:46.929129 | 2021-09-06 12:13:31.525553+01 | COMMIT
test_table_a  | SIReadLock      |      8422 |            | page      |             | 25/297792          |                | 00:45:46.929129 | 2021-09-06 12:13:31.525553+01 | COMMIT
test_table_c  | SIReadLock      |           |            | relation  |             | 25/297064          |                | 00:45:46.929129 | 2021-09-06 12:13:31.525553+01 | COMMIT
test_table_a  | SIReadLock      |      8410 |            | page      |             | 25/299949          |                | 00:45:46.929129 | 2021-09-06 12:13:31.525553+01 | COMMIT
test_table_a  | SIReadLock      |      8415 |            | page      |             | 25/296128          |                | 00:45:46.929129 | 2021-09-06 12:13:31.525553+01 | COMMIT
test_table_a  | SIReadLock      |      8428 |            | page      |             | 25/300080          |                | 00:45:46.929129 | 2021-09-06 12:13:31.525553+01 | COMMIT
test_table_a  | SIReadLock      |      8431 |            | page      |             | 25/297423          |                | 00:45:46.929129 | 2021-09-06 12:13:31.525553+01 | COMMIT
test_table_a  | SIReadLock      |      8417 |          1 | tuple     |             | 25/298509          |                | 00:45:46.929129 | 2021-09-06 12:13:31.525553+01 | COMMIT
test_table_a  | SIReadLock      |      8408 |            | page      |             | 25/297064          |                | 00:45:46.929129 | 2021-09-06 12:13:31.525553+01 | COMMIT
test_table_a  | SIReadLock      |      8427 |            | page      |             | 25/295384          |                | 00:45:46.929129 | 2021-09-06 12:13:31.525553+01 | COMMIT
test_table_a  | SIReadLock      |      8425 |            | page      |             | 25/294236          |                | 00:45:46.929129 | 2021-09-06 12:13:31.525553+01 | COMMIT
test_table_a  | SIReadLock      |      8415 |            | page      |             | 25/294361          |                | 00:45:46.929129 | 2021-09-06 12:13:31.525553+01 | COMMIT

(到這個數據庫伺服器的連接通常少於 30 個,到這個數據庫的連接是 10 個。)

max_pred_locks_per_transaction確定“謂詞鎖定目標雜湊表”的大小。這是與鎖表不同的資料結構,因此當鎖表未滿時,您可能會用完共享記憶體。

SIReadLock事務送出後持有 s是正常的。您的查詢具有誤導性:那些COMMITs 沒有掛起,它們已經完成。

嘗試了一個星期後,我沒有找到任何好的解決方案。最終,我設法通過當場抓住它來確定導致此問題的潛在查詢。

增加鎖使用量通常與長時間執行的查詢或行為不合理的查詢相關聯。在某些 PSQL 版本中可能會發生這種情況,並且PSQL 版本存在錯誤,儘管這種情況很少見。

我有同樣的問題,Hint: You might need to increase max_pred_locks_per_transaction.我已經增加到max_pred_locks_per_transaction2048,一個高得離譜的值。

我如何追踪潛在問題:

  1. 監控 PostgreSQL 程序記憶體。我使用Datadog監視了執行 PSQL 的 Docker 容器。這個問題有點隨機觸發,我找不到監控每個連接每個查詢鎖和記憶體使用的方法。特別是在這種情況下,因為有問題的查詢幾乎導致整個數據庫崩潰。

在此處輸入圖像描述

  1. 使用 pg_activity 監控活動查詢。您可以從任何 PSQL IDE 或psqlcli 提示符手動執行此操作,但這pg_activity會更容易。

在此處輸入圖像描述

經過大量手動監控,我設法確定了導致超出鎖定的查詢,因此記憶體使用量增加。

  1. 修復查詢

就我而言,解決方案是首先擺脫查詢。您可以做的其他解決方案

  • 如果此應用程序端查詢,請將其拆分為多個查詢並送出批次。確保您的應用程序僅在短時間內保持事務打開,並在更新數百或數千行後送出。不要讓單個事務觸及多行。
  • 重新設計查詢,例如使用物化視圖

引用自:https://dba.stackexchange.com/questions/300722