由於鎖定錯誤，守護程序每天都會崩潰

November 24, 2020

一段時間以來，我們的 MySQL 每天都在崩潰，有時甚至多次崩潰。我遇到了很多麻煩，但似乎沒有任何幫助，它也是一個實時伺服器，所以我必須小心測試/實驗。我什至嘗試複製整個伺服器以進行更徹底的測試，但我無法重現那裡的錯誤。所以我相當肯定它至少需要對伺服器施加一定的壓力。

我們在 CentOS 6.8 x86_64 上，執行 MySQL 5.6.29（由於情況我們無法升級）。它有 8 GB 的 RAM，通常大約 4 GB 是“記憶體”記憶體，實際上有 100 MB 未分配。據我所知，所有數據庫都在 InnoDB 的內置版本上執行。

錯誤日誌

第一行中的數據庫和.ibd文件每次都不同，因此並不是那個特定的數據庫/表已損壞。如果是這樣，測試伺服器也會崩潰。

2017-02-10 02:39:47 24223 [ERROR] InnoDB: Unable to lock ./mydb/field_revision_field_tel_nr_.ibd, error: 37
2017-02-10 02:39:47 2ab04c081700  InnoDB: Assertion failure in thread 46936678209280 in file fil0fil.cc line 875
InnoDB: Failing assertion: ret
InnoDB: We intentionally generate a memory trap.
InnoDB: Submit a detailed bug report to http://bugs.mysql.com.
InnoDB: If you get repeated assertion failures or crashes, even
InnoDB: immediately after the mysqld startup, there may be
InnoDB: corruption in the InnoDB tablespace. Please refer to
InnoDB: http://dev.mysql.com/doc/refman/5.6/en/forcing-innodb-recovery.html
InnoDB: about forcing recovery.
01:39:47 UTC - mysqld got signal 6 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
We will try our best to scrape up some info that will hopefully help
diagnose the problem, but since we have already crashed,
something is definitely wrong and this may fail.

key_buffer_size=33554432
read_buffer_size=131072
max_used_connections=18
max_threads=300
thread_count=6
connection_count=5
It is possible that mysqld could use up to
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 151807 K  bytes of memory
Hope that's ok; if not, decrease some variables in the equation.

Thread pointer: 0xf906670
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 2ab04c080e10 thread_stack 0x40000
/usr/sbin/mysqld(my_print_stacktrace+0x35)[0x91f1d5]
/usr/sbin/mysqld(handle_fatal_signal+0x3d8)[0x678e78]
/lib64/libpthread.so.0(+0xf7e0)[0x2aaf7aff57e0]
/lib64/libc.so.6(gsignal+0x35)[0x2aaf7c21c5e5]
/lib64/libc.so.6(abort+0x175)[0x2aaf7c21ddc5]
/usr/sbin/mysqld[0xa8645b]
/usr/sbin/mysqld[0xa8670e]
/usr/sbin/mysqld[0xa8d119]
/usr/sbin/mysqld[0xa57a3b]
/usr/sbin/mysqld[0xa580ab]
/usr/sbin/mysqld[0xa45b1a]
/usr/sbin/mysqld[0xa98716]
/usr/sbin/mysqld[0x940e8a]
/usr/sbin/mysqld[0x72cd36]
/usr/sbin/mysqld[0x73c1fd]
/usr/sbin/mysqld(_Z14get_all_tablesP3THDP10TABLE_LISTP4Item+0x665)[0x73c975]
/usr/sbin/mysqld(_Z24get_schema_tables_resultP4JOIN23enum_schema_table_state+0x2cd)[0x72942d]
/usr/sbin/mysqld(_ZN4JOIN14prepare_resultEPP4ListI4ItemE+0x6d)[0x71c2ad]
/usr/sbin/mysqld(_ZN4JOIN4execEv+0xfd)[0x6d727d]
/usr/sbin/mysqld[0x71ee39]
/usr/sbin/mysqld(_Z12mysql_selectP3THDP10TABLE_LISTjR4ListI4ItemEPS4_P10SQL_I_ListI8st_orderESB_S7_yP13select_resultP18st_select_lex_unitP13st_select_lex+0xbc)[0x71f8fc]
/usr/sbin/mysqld(_Z13handle_selectP3THDP13select_resultm+0x175)[0x71fb05]
/usr/sbin/mysqld[0x6f9929]
/usr/sbin/mysqld(_Z21mysql_execute_commandP3THD+0x34ae)[0x6fe01e]
/usr/sbin/mysqld(_Z11mysql_parseP3THDPcjP12Parser_state+0x338)[0x701d48]
/usr/sbin/mysqld(_Z16dispatch_command19enum_server_commandP3THDPcj+0x1231)[0x703761]
/usr/sbin/mysqld(_Z10do_commandP3THD+0xd7)[0x705037]
/usr/sbin/mysqld(_Z24do_handle_one_connectionP3THD+0x116)[0x6cb956]
/usr/sbin/mysqld(handle_one_connection+0x45)[0x6cba35]
/usr/sbin/mysqld(pfs_spawn_thread+0x126)[0xaf56f6]
/lib64/libpthread.so.0(+0x7aa1)[0x2aaf7afedaa1]
/lib64/libc.so.6(clone+0x6d)[0x2aaf7c2d2aad]

Trying to get some variables.
Some pointers may be invalid and cause the dump to abort.
Query (2ab054008b90): is an invalid pointer
Connection ID (thread ID): 1435
Status: NOT_KILLED

The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains
information that should help you find out what is causing the crash.
170210 02:39:48 mysqld_safe Number of processes running now: 0
170210 02:39:48 mysqld_safe mysqld restarted
2017-02-10 02:39:49 0 [Note] /usr/sbin/mysqld (mysqld 5.6.29) starting as process 5207 ...
2017-02-10 02:39:50 5207 [Note] Plugin 'FEDERATED' is disabled.
2017-02-10 02:39:50 5207 [Note] InnoDB: Using atomics to ref count buffer pool pages
2017-02-10 02:39:50 5207 [Note] InnoDB: The InnoDB memory heap is disabled
2017-02-10 02:39:50 5207 [Note] InnoDB: Mutexes and rw_locks use GCC atomic builtins
2017-02-10 02:39:50 5207 [Note] InnoDB: Memory barrier is not used
2017-02-10 02:39:50 5207 [Note] InnoDB: Compressed tables use zlib 1.2.3
2017-02-10 02:39:50 5207 [Note] InnoDB: Using Linux native AIO
2017-02-10 02:39:50 5207 [Note] InnoDB: Using CPU crc32 instructions
2017-02-10 02:39:50 5207 [Note] InnoDB: Initializing buffer pool, size = 128.0M
2017-02-10 02:39:50 5207 [Note] InnoDB: Completed initialization of buffer pool
2017-02-10 02:39:50 5207 [Note] InnoDB: Highest supported file format is Barracuda.
2017-02-10 02:39:50 5207 [Note] InnoDB: The log sequence numbers 98028365051 and 98028365051 in ibdata files do not match the log sequence number 101119859781 in the ib_logfiles!
2017-02-10 02:39:50 5207 [Note] InnoDB: Database was not shutdown normally!
2017-02-10 02:39:50 5207 [Note] InnoDB: Starting crash recovery.
2017-02-10 02:39:50 5207 [Note] InnoDB: Reading tablespace information from the .ibd files...
2017-02-10 02:40:35 5207 [Note] InnoDB: Restoring possible half-written data pages
2017-02-10 02:40:36 5207 [Note] InnoDB: 128 rollback segment(s) are active.
2017-02-10 02:40:36 5207 [Note] InnoDB: Waiting for purge to start
2017-02-10 02:40:36 5207 [Note] InnoDB: 5.6.29 started; log sequence number 101119859781
2017-02-10 02:40:36 5207 [Note] Server hostname (bind-address): '*'; port: 3306
2017-02-10 02:40:36 5207 [Note] IPv6 is available.
2017-02-10 02:40:36 5207 [Note]   - '::' resolves to '::';
2017-02-10 02:40:36 5207 [Note] Server socket created on IP: '::'.
2017-02-10 02:40:37 5207 [Note] Event Scheduler: Loaded 1 event
2017-02-10 02:40:37 5207 [Note] /usr/sbin/mysqld: ready for connections.
Version: '5.6.29'  socket: '/var/lib/mysql/mysql.sock'  port: 3306  MySQL Community Server (GPL)
2017-02-10 02:40:37 5207 [Note] Event Scheduler: scheduler thread started with id 1

我的.cnf

我執行Percona 配置嚮導來生成大部分配置文件。不完全確定為什麼它包含兩者innodb_file_per_table，innodb-file-per-table但我現在已經離開了。

[mysqld]
event_scheduler=on
local-infile=0
innodb_file_per_table = 1
max_allowed_packet = 256M
explicit_defaults_for_timestamp

# PERCONA WIZARD START #
key-buffer-size = 32M

# Caches/limits
tmp-table-size = 32M
max-heap-table-size = 32M
query-cache-type = 0
query-cache-size = 0
max-connections = 300
thread-cache-size = 50
open-files-limit = 65535
table-definition-cache = 4096
table-open-cache = 10240

innodb-flush-method = O_DIRECT
innodb-log-files-in-group = 2
innodb-log-file-size = 64M
innodb-flush-log-at-trx-commit = 1
innodb-file-per-table = 1
innodb-buffer-pool-size = 128M
# PERCONA WIZARD END #

此外，下面是security/limits.conf. 這個數字高於open-files-limit給它一些空間，以防萬一。

mysql            -       nofile          81920

關於如何解決這個問題的任何想法？

我實際上是在 2017 年的某個地方想出來的，但我忘了給出答案。現在，由於已經有一段時間了，我可能會忘記一些細節，但是以下描述仍然應該非常準確。
為了進一步解決問題，我們決定使用Percona Monitoring and Management。我還啟用了標準慢查詢日誌。經過幾天的數據收集，我們看到了幾件事：
InnoDB 緩衝池在兩個方向上做了很多頁面
許多掛起的讀/寫
對於一個特定的數據庫，許多查詢被報告為很慢（Magento 網站）
因此，我們將伺服器記憶體翻了一番（達到 16 GB）並顯著增加了 InnoDB 緩衝池大小（從 128 MB 到 6 GB）以及一些其他緩衝區。然後我們必須確保沒有任何損壞，因為我們“簡單地”設置innodb_force_recovery = 4、轉儲和刪除所有表，在正常模式下重新啟動 MySQL 並將其全部導入。**請記住，>= 4 的恢復級別可能會永久損壞數據。**對我們來說，它仍然會在第 3 級崩潰，因此會在第 4 級崩潰。
更新的 my.cnf （一些指令可能稍後添加，但它們應該與這個特定問題無關）：
[mysqld]
event_scheduler = on
local-infile = 0
skip-host-cache
symbolic-links = 0
character_set_server = utf8
explicit_defaults_for_timestamp

sql_mode = NO_ENGINE_SUBSTITUTION,STRICT_TRANS_TABLES
concurrent_insert = ALWAYS
low_priority_updates = 1
log-queries-not-using-indexes = 1

max_connections = 128
max_allowed_packet = 512M

read_rnd_buffer_size = 16M
join_buffer_size = 4M
sort_buffer_size = 8M

query_cache_type = 1
query_cache_size = 128M
query_cache_limit = 128M
max_heap_table_size = 512M
thread_cache_size = 8192

table_definition_cache = 4096
table_open_cache = 8192
tmp_table_size = 512M
max_tmp_tables = 4096

innodb_file_per_table = 1
innodb_flush_method = O_DIRECT
innodb_thread_concurrency = 8
innodb_read_io_threads = 32
innodb_write_io_threads = 32

innodb_log_file_size = 256M
innodb_buffer_pool_size = 6G
innodb_log_buffer_size = 256M

innodb_monitor_enable = all
performance_schema = ON

### Tweaks for SSDs
# Default = 200
innodb_io_capacity = 3000

# Default = 2000
innodb_io_capacity_max = 6000
如您所見，一些變數實際上減少了（例如max_connections），而緩衝區通常增加了。由於伺服器現在可以（幾乎）完全從 RAM 中工作，因此任何連接和它們的查詢都應該被處理得足夠快，以至於它不會成為問題。這裡的許多數字實際上取決於伺服器的規格、它必須處理的工作負載類型以及它是否專用於 MySQL。許多人似乎建議專用伺服器上的緩衝池使用大約 80% 的 RAM，但我們的不是其中之一，也不是那麼忙。我們目前可以在略低於 40% 的情況下僥倖逃脫，在某些時候我們可能不得不將其提高到 60%，但至少我們目前有足夠的 RAM 可用。
對我來說，奇怪的是 MySQL 決定硬中止，這可能會導致某處損壞（我似乎記得它實際上做了幾次）。也許這僅僅是因為當時 MySQL 版本已經相當老了，而這個問題已經在一些更高版本中得到修復。

引用自：https://dba.stackexchange.com/questions/163863

由於鎖定錯誤，守護程序每天都會崩潰

錯誤日誌

我的.cnf

相關問答

Mysql 8.0.21 long living Prepared statements 有記憶體洩漏？還是我們在做一些奇怪的事情

‘LATEST DETECTED DEADLOCK’ InnoDB 狀態輸出中缺少語句資訊

從 .ibd、.frm 和 mysqllogbin 文件恢復 MySQL 表

最優索引策略

InnoDB DROP INDEX 是否會在沒有 OPTIMIZE TABLE 的情況下釋放磁碟空間？

MySql 插入時的間隙鎖死鎖