Postgresql大事務OOM殺手

May 13, 2018

我們有一個Postgresql 9.5正在執行CentOS release 6.5 (Final) 的伺服器有 16GB 的記憶體。我們正在執行一個長事務，它在多個表中添加和更新數以千萬計的行。除了這個事務之外，在這個階段沒有在數據庫上執行任何查詢，但它可以在其他查詢期間執行，因此必須進行事務處理。

它工作正常，直到 OOM 殺手認為它足夠並終止postmaster程序：

Out of memory: Kill process 1766 (postmaster) score 890 or sacrifice child
Killed process 1766, UID 26, (postmaster) total-vm:24384508kB, anon-rss:14376288kB, file-rss:138616kB

以下是相關的 postgres 配置：

#-------------------------------------------------------------------------
# RESOURCE USAGE (except WAL)
#-------------------------------------------------------------------------

# - Memory -

shared_buffers = 128MB                  # min 128kB
                                       # (change requires restart)
#huge_pages = try                       # on, off, or try
                                       # (change requires restart)
#temp_buffers = 8MB                     # min 800kB
#max_prepared_transactions = 0          # zero disables the feature
                                       # (change requires restart)
# Caution: it is not advisable to set max_prepared_transactions nonzero unless
# you actively intend to use prepared transactions.
#work_mem = 4MB                         # min 64kB
#maintenance_work_mem = 64MB            # min 1MB
#autovacuum_work_mem = -1               # min 1MB, or -1 to use maintenance_work_mem
#max_stack_depth = 2MB                  # min 100kB
dynamic_shared_memory_type = posix      # the default is the first option
                                       # supported by the operating system:
                                       #   posix
                                       #   sysv
                                       #   windows
                                       #   mmap
                                       # use none to disable dynamic shared memory

# - Disk -

#temp_file_limit = -1                   # limits per-session temp file space
                                       # in kB, or -1 for no limit

# - Kernel Resource Usage -

#max_files_per_process = 1000           # min 25
                                       # (change requires restart)
#shared_preload_libraries = ''          # (change requires restart)

# - Cost-Based Vacuum Delay -

#vacuum_cost_delay = 0                  # 0-100 milliseconds
#vacuum_cost_page_hit = 1               # 0-10000 credits
#vacuum_cost_page_miss = 10             # 0-10000 credits
#vacuum_cost_page_dirty = 20            # 0-10000 credits
#vacuum_cost_limit = 200                # 1-10000 credits

# - Background Writer -

#bgwriter_delay = 200ms                 # 10-10000ms between rounds
#bgwriter_lru_maxpages = 100            # 0-1000 max buffers written/round
#bgwriter_lru_multiplier = 2.0          # 0-10.0 multipler on buffers scanned/round

# - Asynchronous Behavior -

#effective_io_concurrency = 1           # 1-1000; 0 disables prefetching
#max_worker_processes = 8

也vm.overcommit_memory設置為2

我發現的一種解決方案是為我試圖避免的特定程序禁用 OOM 殺手，因為我覺得這不是正確的解決方案。我沒有找到任何方法來強制 postgres 將記憶體刷新到磁碟或限制使用的記憶體總量。

有什麼建議麼？

好的，所以在嘗試了很多東西之後，我找到了解決方案。
這個問題隱藏在我們對JDBC庫的使用中。
我們與設置為 false 的自動送出建立了長期聯繫：
connection.setAutoCommit(false)
在那段時間裡，我們做了很多小查詢和一些帶有游標的查詢：
statement.setFetchSize(SOME_FETCH_SIZE)
在 JDBC 中，您創建一個connection對象，然後從該連接中創建statements. 當你執行語句時，你會得到一個result set.
現在，這些對像中的每一個都需要關閉，但是如果您關閉語句，則條目集將關閉，如果您關閉連接，則所有語句及其結果集都將關閉。
我們習慣於使用自己的連接進行短暫的查詢，因此我們從不關閉語句，假設連接一旦關閉就會處理這些事情。
現在的問題在於這個從未關閉連接的長時間事務（約 24 小時）。聲明從未結束。顯然，語句對像在執行程式碼的伺服器和 PostgreSQL 數據庫上都擁有資源。
我對數據庫中剩餘資源的最佳猜測是與游標相關的內容。使用游標的語句從未關閉，因此它們返回的結果集也從未關閉。這意味著數據庫沒有釋放數據庫中的相關游標資源，並且由於它位於一個巨大的表上，因此需要大量的 RAM。
我唯一不明白的是，為什麼 PostgreSQL 在分配記憶體失敗時沒有將一些資源保存在磁碟上（不是交換）。
希望它對將來有類似問題的人有所幫助。

引用自：https://dba.stackexchange.com/questions/206448

Postgresql大事務OOM殺手

相關問答

COMMIT 是否在 PostgreSQL 9.5 中的匿名 plgpsql 函式中工作？

可以在讀取送出事務中的一個查詢中更改子選擇嗎？

如何在事務 ID 環繞後比較 xmin 和 txid_current()？

為什麼我得到舊數據？

簡化重複的 INSERT 命令，為未找到的值放置 NULL

Oracle 更改 PGA 導致記憶體問題