由於鎖定錯誤,守護程序每天都會崩潰
一段時間以來,我們的 MySQL 每天都在崩潰,有時甚至多次崩潰。我遇到了很多麻煩,但似乎沒有任何幫助,它也是一個實時伺服器,所以我必須小心測試/實驗。我什至嘗試複製整個伺服器以進行更徹底的測試,但我無法重現那裡的錯誤。所以我相當肯定它至少需要對伺服器施加一定的壓力。
我們在 CentOS 6.8 x86_64 上,執行 MySQL 5.6.29(由於情況我們無法升級)。它有 8 GB 的 RAM,通常大約 4 GB 是“記憶體”記憶體,實際上有 100 MB 未分配。據我所知,所有數據庫都在 InnoDB 的內置版本上執行。
錯誤日誌
第一行中的數據庫和
.ibd
文件每次都不同,因此並不是那個特定的數據庫/表已損壞。如果是這樣,測試伺服器也會崩潰。2017-02-10 02:39:47 24223 [ERROR] InnoDB: Unable to lock ./mydb/field_revision_field_tel_nr_.ibd, error: 37 2017-02-10 02:39:47 2ab04c081700 InnoDB: Assertion failure in thread 46936678209280 in file fil0fil.cc line 875 InnoDB: Failing assertion: ret InnoDB: We intentionally generate a memory trap. InnoDB: Submit a detailed bug report to http://bugs.mysql.com. InnoDB: If you get repeated assertion failures or crashes, even InnoDB: immediately after the mysqld startup, there may be InnoDB: corruption in the InnoDB tablespace. Please refer to InnoDB: http://dev.mysql.com/doc/refman/5.6/en/forcing-innodb-recovery.html InnoDB: about forcing recovery. 01:39:47 UTC - mysqld got signal 6 ; This could be because you hit a bug. It is also possible that this binary or one of the libraries it was linked against is corrupt, improperly built, or misconfigured. This error can also be caused by malfunctioning hardware. We will try our best to scrape up some info that will hopefully help diagnose the problem, but since we have already crashed, something is definitely wrong and this may fail. key_buffer_size=33554432 read_buffer_size=131072 max_used_connections=18 max_threads=300 thread_count=6 connection_count=5 It is possible that mysqld could use up to key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 151807 K bytes of memory Hope that's ok; if not, decrease some variables in the equation. Thread pointer: 0xf906670 Attempting backtrace. You can use the following information to find out where mysqld died. If you see no messages after this, something went terribly wrong... stack_bottom = 2ab04c080e10 thread_stack 0x40000 /usr/sbin/mysqld(my_print_stacktrace+0x35)[0x91f1d5] /usr/sbin/mysqld(handle_fatal_signal+0x3d8)[0x678e78] /lib64/libpthread.so.0(+0xf7e0)[0x2aaf7aff57e0] /lib64/libc.so.6(gsignal+0x35)[0x2aaf7c21c5e5] /lib64/libc.so.6(abort+0x175)[0x2aaf7c21ddc5] /usr/sbin/mysqld[0xa8645b] /usr/sbin/mysqld[0xa8670e] /usr/sbin/mysqld[0xa8d119] /usr/sbin/mysqld[0xa57a3b] /usr/sbin/mysqld[0xa580ab] /usr/sbin/mysqld[0xa45b1a] /usr/sbin/mysqld[0xa98716] /usr/sbin/mysqld[0x940e8a] /usr/sbin/mysqld[0x72cd36] /usr/sbin/mysqld[0x73c1fd] /usr/sbin/mysqld(_Z14get_all_tablesP3THDP10TABLE_LISTP4Item+0x665)[0x73c975] /usr/sbin/mysqld(_Z24get_schema_tables_resultP4JOIN23enum_schema_table_state+0x2cd)[0x72942d] /usr/sbin/mysqld(_ZN4JOIN14prepare_resultEPP4ListI4ItemE+0x6d)[0x71c2ad] /usr/sbin/mysqld(_ZN4JOIN4execEv+0xfd)[0x6d727d] /usr/sbin/mysqld[0x71ee39] /usr/sbin/mysqld(_Z12mysql_selectP3THDP10TABLE_LISTjR4ListI4ItemEPS4_P10SQL_I_ListI8st_orderESB_S7_yP13select_resultP18st_select_lex_unitP13st_select_lex+0xbc)[0x71f8fc] /usr/sbin/mysqld(_Z13handle_selectP3THDP13select_resultm+0x175)[0x71fb05] /usr/sbin/mysqld[0x6f9929] /usr/sbin/mysqld(_Z21mysql_execute_commandP3THD+0x34ae)[0x6fe01e] /usr/sbin/mysqld(_Z11mysql_parseP3THDPcjP12Parser_state+0x338)[0x701d48] /usr/sbin/mysqld(_Z16dispatch_command19enum_server_commandP3THDPcj+0x1231)[0x703761] /usr/sbin/mysqld(_Z10do_commandP3THD+0xd7)[0x705037] /usr/sbin/mysqld(_Z24do_handle_one_connectionP3THD+0x116)[0x6cb956] /usr/sbin/mysqld(handle_one_connection+0x45)[0x6cba35] /usr/sbin/mysqld(pfs_spawn_thread+0x126)[0xaf56f6] /lib64/libpthread.so.0(+0x7aa1)[0x2aaf7afedaa1] /lib64/libc.so.6(clone+0x6d)[0x2aaf7c2d2aad] Trying to get some variables. Some pointers may be invalid and cause the dump to abort. Query (2ab054008b90): is an invalid pointer Connection ID (thread ID): 1435 Status: NOT_KILLED The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains information that should help you find out what is causing the crash. 170210 02:39:48 mysqld_safe Number of processes running now: 0 170210 02:39:48 mysqld_safe mysqld restarted 2017-02-10 02:39:49 0 [Note] /usr/sbin/mysqld (mysqld 5.6.29) starting as process 5207 ... 2017-02-10 02:39:50 5207 [Note] Plugin 'FEDERATED' is disabled. 2017-02-10 02:39:50 5207 [Note] InnoDB: Using atomics to ref count buffer pool pages 2017-02-10 02:39:50 5207 [Note] InnoDB: The InnoDB memory heap is disabled 2017-02-10 02:39:50 5207 [Note] InnoDB: Mutexes and rw_locks use GCC atomic builtins 2017-02-10 02:39:50 5207 [Note] InnoDB: Memory barrier is not used 2017-02-10 02:39:50 5207 [Note] InnoDB: Compressed tables use zlib 1.2.3 2017-02-10 02:39:50 5207 [Note] InnoDB: Using Linux native AIO 2017-02-10 02:39:50 5207 [Note] InnoDB: Using CPU crc32 instructions 2017-02-10 02:39:50 5207 [Note] InnoDB: Initializing buffer pool, size = 128.0M 2017-02-10 02:39:50 5207 [Note] InnoDB: Completed initialization of buffer pool 2017-02-10 02:39:50 5207 [Note] InnoDB: Highest supported file format is Barracuda. 2017-02-10 02:39:50 5207 [Note] InnoDB: The log sequence numbers 98028365051 and 98028365051 in ibdata files do not match the log sequence number 101119859781 in the ib_logfiles! 2017-02-10 02:39:50 5207 [Note] InnoDB: Database was not shutdown normally! 2017-02-10 02:39:50 5207 [Note] InnoDB: Starting crash recovery. 2017-02-10 02:39:50 5207 [Note] InnoDB: Reading tablespace information from the .ibd files... 2017-02-10 02:40:35 5207 [Note] InnoDB: Restoring possible half-written data pages 2017-02-10 02:40:36 5207 [Note] InnoDB: 128 rollback segment(s) are active. 2017-02-10 02:40:36 5207 [Note] InnoDB: Waiting for purge to start 2017-02-10 02:40:36 5207 [Note] InnoDB: 5.6.29 started; log sequence number 101119859781 2017-02-10 02:40:36 5207 [Note] Server hostname (bind-address): '*'; port: 3306 2017-02-10 02:40:36 5207 [Note] IPv6 is available. 2017-02-10 02:40:36 5207 [Note] - '::' resolves to '::'; 2017-02-10 02:40:36 5207 [Note] Server socket created on IP: '::'. 2017-02-10 02:40:37 5207 [Note] Event Scheduler: Loaded 1 event 2017-02-10 02:40:37 5207 [Note] /usr/sbin/mysqld: ready for connections. Version: '5.6.29' socket: '/var/lib/mysql/mysql.sock' port: 3306 MySQL Community Server (GPL) 2017-02-10 02:40:37 5207 [Note] Event Scheduler: scheduler thread started with id 1
我的.cnf
我執行Percona 配置嚮導來生成大部分配置文件。不完全確定為什麼它包含兩者
innodb_file_per_table
,innodb-file-per-table
但我現在已經離開了。[mysqld] event_scheduler=on local-infile=0 innodb_file_per_table = 1 max_allowed_packet = 256M explicit_defaults_for_timestamp # PERCONA WIZARD START # key-buffer-size = 32M # Caches/limits tmp-table-size = 32M max-heap-table-size = 32M query-cache-type = 0 query-cache-size = 0 max-connections = 300 thread-cache-size = 50 open-files-limit = 65535 table-definition-cache = 4096 table-open-cache = 10240 innodb-flush-method = O_DIRECT innodb-log-files-in-group = 2 innodb-log-file-size = 64M innodb-flush-log-at-trx-commit = 1 innodb-file-per-table = 1 innodb-buffer-pool-size = 128M # PERCONA WIZARD END #
此外,下面是
security/limits.conf
. 這個數字高於open-files-limit
給它一些空間,以防萬一。mysql - nofile 81920
關於如何解決這個問題的任何想法?
我實際上是在 2017 年的某個地方想出來的,但我忘了給出答案。現在,由於已經有一段時間了,我可能會忘記一些細節,但是以下描述仍然應該非常準確。
為了進一步解決問題,我們決定使用Percona Monitoring and Management。我還啟用了標準慢查詢日誌。經過幾天的數據收集,我們看到了幾件事:
- InnoDB 緩衝池在兩個方向上做了很多頁面
- 許多掛起的讀/寫
- 對於一個特定的數據庫,許多查詢被報告為很慢(Magento 網站)
因此,我們將伺服器記憶體翻了一番(達到 16 GB)並顯著增加了 InnoDB 緩衝池大小(從 128 MB 到 6 GB)以及一些其他緩衝區。然後我們必須確保沒有任何損壞,因為我們“簡單地”設置
innodb_force_recovery = 4
、轉儲和刪除所有表,在正常模式下重新啟動 MySQL 並將其全部導入。**請記住,>= 4 的恢復級別可能會永久損壞數據。**對我們來說,它仍然會在第 3 級崩潰,因此會在第 4 級崩潰。更新的 my.cnf (一些指令可能稍後添加,但它們應該與這個特定問題無關):
[mysqld] event_scheduler = on local-infile = 0 skip-host-cache symbolic-links = 0 character_set_server = utf8 explicit_defaults_for_timestamp sql_mode = NO_ENGINE_SUBSTITUTION,STRICT_TRANS_TABLES concurrent_insert = ALWAYS low_priority_updates = 1 log-queries-not-using-indexes = 1 max_connections = 128 max_allowed_packet = 512M read_rnd_buffer_size = 16M join_buffer_size = 4M sort_buffer_size = 8M query_cache_type = 1 query_cache_size = 128M query_cache_limit = 128M max_heap_table_size = 512M thread_cache_size = 8192 table_definition_cache = 4096 table_open_cache = 8192 tmp_table_size = 512M max_tmp_tables = 4096 innodb_file_per_table = 1 innodb_flush_method = O_DIRECT innodb_thread_concurrency = 8 innodb_read_io_threads = 32 innodb_write_io_threads = 32 innodb_log_file_size = 256M innodb_buffer_pool_size = 6G innodb_log_buffer_size = 256M innodb_monitor_enable = all performance_schema = ON ### Tweaks for SSDs # Default = 200 innodb_io_capacity = 3000 # Default = 2000 innodb_io_capacity_max = 6000
如您所見,一些變數實際上減少了(例如
max_connections
),而緩衝區通常增加了。由於伺服器現在可以(幾乎)完全從 RAM 中工作,因此任何連接和它們的查詢都應該被處理得足夠快,以至於它不會成為問題。這裡的許多數字實際上取決於伺服器的規格、它必須處理的工作負載類型以及它是否專用於 MySQL。許多人似乎建議專用伺服器上的緩衝池使用大約 80% 的 RAM,但我們的不是其中之一,也不是那麼忙。我們目前可以在略低於 40% 的情況下僥倖逃脫,在某些時候我們可能不得不將其提高到 60%,但至少我們目前有足夠的 RAM 可用。對我來說,奇怪的是 MySQL 決定硬中止,這可能會導致某處損壞(我似乎記得它實際上做了幾次)。也許這僅僅是因為當時 MySQL 版本已經相當老了,而這個問題已經在一些更高版本中得到修復。