Postgresql

如何在命令失敗後修復“無法寫入文件 pg_wal/xlogtemp”?

  • March 17, 2022

我跑了vacuum full analyze,但是時間太長了,而且它使用了 100% 的 RAM,所以我殺了它。現在,Postgres 不會啟動。我有:

* Starting PostgreSQL 14 database server                                                                                            
* Error: /usr/lib/postgresql/14/bin/pg_ctl /usr/lib/postgresql/14/bin/pg_ctl start -D /var/lib/postgresql/14/main -l /var/log/postgresql/postgresql-14-main.log -s -o  -c config_file="/etc/postgresql/14/main/postgresql.conf"  exited with status 1:
2022-03-15 22:51:01.259 PDT [915] LOG:  starting PostgreSQL 14.2 (Ubuntu 14.2-1.pgdg20.04+1) on x86_64-pc-linux-gnu, compiled by gcc 
(Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0, 64-bit
2022-03-15 22:51:01.259 PDT [915] LOG:  listening on IPv4 address "0.0.0.0", port 5432
2022-03-15 22:51:01.259 PDT [915] LOG:  listening on IPv6 address "::", port 5432
2022-03-15 22:51:01.263 PDT [915] LOG:  listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432"
2022-03-15 22:51:01.271 PDT [916] LOG:  database system was interrupted while in recovery at 2022-03-15 21:52:50 PDT
2022-03-15 22:51:01.271 PDT [916] HINT:  This probably means that some data is corrupted and you will have to use the last backup for recovery.
2022-03-15 22:51:02.578 PDT [917] postgres@foo FATAL:  the database system is starting up
2022-03-15 22:51:02.595 PDT [918] postgres@foo FATAL:  the database system is starting up
2022-03-15 22:51:02.612 PDT [919] postgres@foo FATAL:  the database system is starting up
2022-03-15 22:51:02.641 PDT [920] postgres@foo FATAL:  the database system is starting up
2022-03-15 22:51:02.656 PDT [921] postgres@foo FATAL:  the database system is starting up
2022-03-15 22:51:02.683 PDT [922] postgres@foo FATAL:  the database system is starting up
2022-03-15 22:51:03.570 PDT [916] LOG:  database system was not properly shut down; automatic recovery in progress
2022-03-15 22:51:03.573 PDT [916] LOG:  redo starts at 36/21030540
2022-03-15 22:51:04.368 PDT [916] LOG:  redo done at 36/2EFFF1D8 system usage: CPU: user: 0.40 s, system: 0.38 s, elapsed: 0.79 s    
2022-03-15 22:51:04.376 PDT [916] PANIC:  could not write to file "pg_wal/xlogtemp.916": No space left on device
2022-03-15 22:51:04.378 PDT [915] LOG:  startup process (PID 916) was terminated by signal 6: Aborted
2022-03-15 22:51:04.378 PDT [915] LOG:  aborting startup due to startup process failure
2022-03-15 22:51:04.386 PDT [915] LOG:  database system is shut down
pg_ctl: could not start server
Examine the log output.

數據庫很小,應該不會用完磁碟空間,所以vacuum full analyze肯定創建了很多垃圾文件。解決此問題的正確方法是什麼?不知道怎麼用pg_archivecleanup,找不到存檔目錄,/var/lib/postgresql/14/main/pg_xlog不存在。這是文件夾中的所有內容:

sudo ls /var/lib/postgresql/14/main/
PG_VERSION  pg_commit_ts  pg_multixact  pg_serial     pg_stat_tmp  pg_twophase postgresql.auto.conf
base        pg_dynshmem   pg_notify     pg_snapshots  pg_subtrans  pg_wal      postmaster.opts
global      pg_logical    pg_replslot   pg_stat       pg_tblspc    pg_xact

這是一個開發數據庫,所以在最壞的情況下我可以刪除所有內容。但是,如果在生產中發生這種情況,我想知道我還能如何解決它。

作業系統:Ubuntu 20 是 WSL

顯然您的磁碟空間不足pg_wal

恢復方式:

  1. 增加磁碟大小(不要刪除文件
  2. 啟動 PostgreSQL
  3. pg_dumpall集群
  4. 停止 PostgreSQL 伺服器並刪除數據目錄(保存配置文件)
  5. 創建一個新集群initdb,添加配置並啟動它
  6. 恢復步驟 3 中的轉儲

這將擺脫崩潰留下的文件。

下次,VACUUM (FULL)除非您有明確的指示,否則不要執行,特別是不要在數據庫中的所有表上執行它,而只在那些真正需要它的表上執行。

注意:要收集統計資訊,請執行ANALYZE. 而且VACUUM (FULL)不是更好的版本,而是VACUUM完全不同的東西。

尋找足夠的文件以刪除或移動到其他地方,以便系統有足夠的空間啟動。一旦它存活足夠長的時間來啟動和檢查點,它可能會自行清理更多空間。在您顯示的清單中,我沒有看到人類可讀的日誌文件目錄(在最新版本中預設命名為“log”,在舊版本中預設命名為“pg_log”)。這通常是搬遷的最佳人選。那麼接下來,在同一個分區的其他地方,但在 PostgreSQL 數據目錄之外的其他文件呢?

引用自:https://dba.stackexchange.com/questions/308782