對大型數據庫的查詢會終止與伺服器的連接,與 LIMIT 一起使用
我正在嘗試在大型數據庫上執行查詢而不終止與伺服器的連接。
我在具有 16gb 記憶體和大約 40gb 可用磁碟的 mac 上使用 Postgres 12.1。數據庫為 78gb,
pg_database_size
根據 do 最大表為 20gbpg_total_relation_size
。無論我執行哪個非工作查詢,我(從日誌中)得到的錯誤是:
server process (PID xxx) was terminated by signal 9: Killed: 9
在 VS 程式碼中,錯誤是
"lost connection to server"
.兩個不起作用的例子是:
UPDATE table SET column = NULL WHERE column = 0;
select columnA from table1 where columnA NOT IN ( select columnB from table2 );
LIMIT
我可以通過添加例如1,000,000來執行一些查詢(例如上面的查詢) 。我懷疑由於臨時文件而導致磁碟不足,但在日誌中(帶有
log_temp_files = 0
),我看不到任何正在寫入的臨時文件。我嘗試增加和減少
work_mem
,maintenance_work_mem
,shared_buffers
和temp_buffers
. 沒有工作,性能大致相同。我嘗試刪除所有索引,這降低了某些查詢的“成本”,但它們仍然終止了與伺服器的連接。
什麼可能是我的問題,我該如何進一步解決這個問題?
此外,我讀到來自超時查詢的臨時文件儲存在 pqsql_tmp 中。我檢查了文件夾,它沒有很大的文件。臨時文件可以儲存在其他地方嗎?
執行失敗查詢的 posgtres 日誌如下所示:
2020-02-17 09:31:08.626 CET [94908] LOG: server process (PID xxx) was terminated by signal 9: Killed: 9 2020-02-17 09:31:08.626 CET [94908] DETAIL: Failed process was running: update table set columnname = NULL where columnname = 0; 2020-02-17 09:31:08.626 CET [94908] LOG: terminating any other active server processes 2020-02-17 09:31:08.626 CET [94919] WARNING: terminating connection because of crash of another server process 2020-02-17 09:31:08.626 CET [94919] DETAIL: The postmaster has commanded this server process to roll back the current transaction and exi$ 2020-02-17 09:31:08.626 CET [94919] HINT: In a moment you should be able to reconnect to the database and repeat your command. 2020-02-17 09:31:08.626 CET [94914] WARNING: terminating connection because of crash of another server process 2020-02-17 09:31:08.626 CET [94914] DETAIL: The postmaster has commanded this server process to roll back the current transaction and exi$ 2020-02-17 09:31:08.626 CET [94914] HINT: In a moment you should be able to reconnect to the database and repeat your command. 2020-02-17 09:31:08.629 CET [94908] LOG: all server processes terminated; reinitializing 2020-02-17 09:31:08.698 CET [94927] LOG: database system was interrupted; last known up at 2020-02-17 09:30:57 CET 2020-02-17 09:31:08.901 CET [94927] LOG: database system was not properly shut down; automatic recovery in progress 2020-02-17 09:31:08.906 CET [94927] LOG: invalid record length at 17/894C438: wanted 24, got 0 2020-02-17 09:31:08.906 CET [94927] LOG: redo is not required
EXPLAIN
在第二個範例查詢上執行返回:Seq Scan on gas_prices_all p (cost=459.93..5635583.33 rows=128975016 width=16) Filter: (NOT (hashed SubPlan 1)) SubPlan 1 -> Seq Scan on gas_station g (cost=0.00..423.14 rows=14714 width=16) JIT: Functions: 13 Options: Inlining true, Optimization true, Expressions true, Deforming true
FWIW 對於“成本”大約為零的查詢,我得到了同樣的錯誤。
更新:我發現了今天早些時候的崩潰報告:
Process: postgres [41042] Path: /Users/USER/*/postgres Identifier: postgres Version: 0 Code Type: X86-64 (Native) Parent Process: postgres [40107] Responsible: postgres [41042] User ID: 502 Date/Time: 2020-02-18 11:16:26.210 +0100 OS Version: Mac OS X 10.14.5 (18F132) Report Version: 12 Anonymous UUID: F41CCD21-C558-6CB0-316D-D1FF3E279576 Sleep/Wake UUID: 5F08EAEC-373A-4D19-A243-E812E68D2697 Time Awake Since Boot: 1600000 seconds Time Since Wake: 5700 seconds System Integrity Protection: enabled Crashed Thread: 0 Dispatch queue: com.apple.main-thread Exception Type: EXC_BAD_ACCESS (Code Signature Invalid) Exception Codes: 0x0000000000000032, 0x00000001044c4060 Exception Note: EXC_CORPSE_NOTIFY Termination Reason: Namespace CODESIGNING, Code 0x2 kernel messages: VM Regions Near 0x1044c4060: __LINKEDIT 0000000104466000-00000001044c4000 [ 376K] r--/rw- SM=COW /Users/USER/*/*.dylib --> VM_ALLOCATE 00000001044c4000-00000001044c5000 [ 4K] r-x/rwx SM=ZER VM_ALLOCATE 00000001044c5000-00000001044c6000 [ 4K] rw-/rwx SM=ZER Application Specific Information: crashed on child side of fork pre-exec Thread 0 Crashed:: Dispatch queue: com.apple.main-thread 0 ??? 0x00000001044c4060 0 + 4367073376 1 postgres 0x0000000103798851 ExecAgg + 1185 (executor.h:308) 2 postgres 0x0000000103785d0f standard_ExecutorRun + 287 (execMain.c:1652) 3 postgres 0x0000000103789c4e ParallelQueryMain + 670 (execParallel.c:1405) 4 postgres 0x00000001036520ce ParallelWorkerMain + 1054 (parallel.c:1434) 5 postgres 0x000000010385bec5 StartBackgroundWorker + 533 (bgworker.c:834) 6 postgres 0x000000010386acb9 maybe_start_bgworkers + 1161 7 postgres 0x00000001038696c5 sigusr1_handler + 357 (postmaster.c:5167) 8 libsystem_platform.dylib 0x00007fff76195b5d _sigtramp + 29 9 ??? 0x0000000000003200 0 + 12800 10 postgres 0x00000001037d54ae main + 1678 11 libdyld.dylib 0x00007fff75faa3d5 start + 1 Thread 0 crashed with X86 Thread State (64-bit): rax: 0x00000001044c4060 rbx: 0x00007f98b9045600 rcx: 0x00000001072c48d8 rdx: 0x00007ffeec6242f4 rdi: 0x00007f98b9047520 rsi: 0x00007f98b9045fc0 rbp: 0x00007ffeec624320 rsp: 0x00007ffeec624288 r8: 0x00000000014bafca r9: 0xffffffff00000000 r10: 0x00000001072c48d0 r11: 0x0000000000000005 r12: 0x0000000103c51220 r13: 0x00007f98b9047510 r14: 0x00007f98b9045fc0 r15: 0x00007f98b90459a0 rip: 0x00000001044c4060 rfl: 0x0000000000010246 cr2: 0x00000001044c4060 Logical CPU: 0 Error Code: 0x00000015 Trap Number: 14 Binary Images: **lots of stuff** External Modification Summary: Calls made by other processes targeting this process: task_for_pid: 0 thread_create: 0 thread_set_state: 0 Calls made by this process: task_for_pid: 0 thread_create: 0 thread_set_state: 0 Calls made by all processes on this machine: task_for_pid: 134420121 thread_create: 0 thread_set_state: 0 VM Region Summary: ReadOnly portion of Libraries: Total=398.8M resident=0K(0%) swapped_out_or_unallocated=398.8M(100%) Writable regions: Total=4.2G written=0K(0%) resident=0K(0%) swapped_out=0K(0%) unallocated=4.2G(100%) VIRTUAL REGION REGION TYPE SIZE COUNT (non-coalesced) =========== ======= ======= Kernel Alloc Once 8K 1 MALLOC 81.0M 19 MALLOC guard page 16K 3 MALLOC_LARGE (reserved) 72K 2 reserved VM address space (unallocated) STACK GUARD 56.0M 1 Stack 8192K 1 VM_ALLOCATE 4.1G 4 __DATA 18.0M 200 __FONT_DATA 4K 1 __LINKEDIT 232.2M 11 __TEXT 166.6M 199 __UNICODE 564K 1 mapped file 64K 1 shared memory 552K 6 =========== ======= ======= TOTAL 4.7G 450 TOTAL, minus reserved VM space 4.7G 450
我可以在日誌中找到兩件可能很有趣的事情:
2020-02-18 19:01:52.044375+0100 localhost kernel[0]: CODE SIGNING: process 51528[postgres]: rejecting invalid page at address 0x1100c1000 from offset 0x0 in file "<nil>" (cs_mtime:0.0 == mtime:0.0) (signed:0 validated:0 tainted:0 nx:0 wpmapped:1 dirty:0 depth:0) 2020-02-18 19:01:52.044805+0100 localhost ReportCrash[52560]: unknown nested kcdata type: 0x1004, size: 2108
2020-02-18 19:01:55.268060+0100 localhost ReportCrash[52560]: (CrashReporterSupport) Saved crash report for postgres[51528] version 0 to postgres_2020-02-18-190155_MacBook-Pro.crash 2020-02-18 19:01:55.273159+0100 localhost ReportCrash[52560]: (CrashReporterSupport) Removing excessive log: postgres_2020-02-18-190155_MacBook-Pro.crash 2020-02-18 19:01:55.274208+0100 localhost ReportCrash[52560]: shouldDisplayUnexpectedlyQuitNotification is NO
更新:我執行了第二個 exaple 查詢(上面),
LIMIT
直到它從工作(LIMIT
2,200,000)到終止與伺服器的連接(LIMIT
2,300,000)。EXPLAIN ANALYZE
關於LIMIT
2,200,000 的查詢是:Limit (cost=459.93..96581.42 rows=2200000 width=16) (actual time=13.228..38573.440 rows=2200000 loops=1) -> Seq Scan table1 (cost=459.93..5635583.33 rows=128975016 width=16) (actual time=13.227..38374.070 rows=2200000 loops=1) Filter: (NOT (hashed SubPlan 1)) Rows Removed by Filter: 139729529 SubPlan 1 -> Seq Scan on table2 (cost=0.00..423.14 rows=14714 width=16) (actual time=0.350..6.925 rows=14714 loops=1) Planning Time: 0.138 ms Execution Time: 38685.762 ms
EXPLAIN
在LIMIT
2,300,000 上是:(EXPLAIN ANALYZE
在這裡崩潰)Limit (cost=459.93..100950.58 rows=2300000 width=16) -> Seq Scan on table1 (cost=459.93..5635583.33 rows=128975016 width=16) Filter: (NOT (hashed SubPlan 1)) SubPlan 1 -> Seq Scan on table2 (cost=0.00..423.14 rows=14714 width=16) JIT: Functions: 14 Options: Inlining false, Optimization false, Expressions true, Deforming true
我正在讀這篇文章,因為 JIT 在這裡踢,這是有道理的給
jit_above_cost
定(預設)設置為 100,000。所以也許問題出在 JIT 上?更新 2:使用
jit=off
第二個範例查詢有效,並且也是 ca。快兩倍。這怎麼可能以及什麼可能導致 JIT 在我的系統上出現問題?
感謝您的所有建議和意見。
我已經玩了足夠長的時間來得出結論,禁用 JIT (
jit = off
) 解決了這個問題。我可以看到 JIT 僅在 PostgreSQL 11 中引入,這可能就是為什麼網上沒有太多關於它可能導致的問題的原因。
“信號 9”表明這幾乎可以肯定是記憶體不足的殺手。您可以查看
/var/log/kern.log
或在您的系統上呼叫任何內容進行驗證。臨時文件上的磁碟空間不足不會導致此錯誤。您應該
EXPLAIN
對查詢進行查詢以查看它正在使用什麼計劃。我通常會猜這是一個錯亂的 HashAgg,除了你的第一個查詢不應該做任何聚合、雜湊或其他方式。您也可以嘗試關閉過度使用。這樣,您的程序就會出現普通的記憶體不足錯誤,然後他們可以有序地報告這些錯誤,而不是僅僅從軌道上被核彈而沒有機會報告問題。有適用於 Linux 的說明,也許您可以將它們調整為適用於 MacOS。如果您可以在記憶體使用量開始大量增長時擷取該程序,但在它被殺死之前,您可以獲得有關記憶體分配位置的報告。