Mysql
Percona XtraDB - 隨機死鎖
隨機我的數據庫被完全鎖定並且查詢掛起並且它們堆積起來直到它得到“太多的連接”。
下面是我的 mysql 日誌文件,我找不到與錯誤系統呼叫相關的任何內容。
Aug 28 14:06:36 db1a mysqld: 2015-08-28 14:06:36 11786 [Note] WSREP: Created page /var/data/mysql/gcache.page.000008 of size 134217728 bytes Aug 28 14:07:06 db1a mysqld: 2015-08-28 14:07:06 11786 [Note] WSREP: Deleted page /var/data/mysql/gcache.page.000008 Aug 28 16:39:38 db1a mysqld: 2015-08-28 16:39:38 11786 [Note] WSREP: Created page /var/data/mysql/gcache.page.000009 of size 134217728 bytes Aug 28 16:39:48 db1a mysqld: 2015-08-28 16:39:48 11786 [Note] WSREP: Deleted page /var/data/mysql/gcache.page.000009 Aug 28 19:42:07 db1a mysqld: 2015-08-28 19:42:07 11786 [Note] WSREP: Created page /var/data/mysql/gcache.page.000010 of size 134217728 bytes Aug 28 19:42:08 db1a mysqld: 2015-08-28 19:42:08 11786 [Note] WSREP: Created page /var/data/mysql/gcache.page.000011 of size 134217728 bytes Aug 28 19:42:10 db1a mysqld: 2015-08-28 19:42:10 11786 [Warning] WSREP: Failed to report last committed 758795619, -4 (Interrupted system call) Aug 28 19:42:45 db1a mysqld: 2015-08-28 19:42:45 11786 [Warning] WSREP: Failed to report last committed 758795879, -4 (Interrupted system call) Aug 28 19:43:07 db1a mysqld: 2015-08-28 19:43:07 11786 [Warning] WSREP: Failed to report last committed 758796011, -4 (Interrupted system call) Aug 28 19:43:11 db1a mysqld: 2015-08-28 19:43:11 11786 [Warning] WSREP: Failed to report last committed 758796012, -4 (Interrupted system call) Aug 28 19:43:49 db1a mysqld: 2015-08-28 19:43:49 11786 [Warning] Too many connections Aug 28 19:43:49 db1a mysqld: 2015-08-28 19:43:49 11786 [Warning] Too many connections Aug 28 19:43:50 db1a mysqld: 2015-08-28 19:43:50 11786 [Warning] Too many connections Aug 28 19:43:51 db1a mysqld: 2015-08-28 19:43:51 11786 [Warning] Too many connections
MySQL 開始在我的應用程序中拋出此錯誤:
'SQLSTATE[HY000]: General error: 1205 Lock wait timeout exceeded; try restarting transaction
版本:5.6.24-72.2-56-log Percona XtraDB Cluster (GPL),版本 rel72.2,修訂版 43abf03,WSREP 版本 25.11,wsrep_25.11
我的.cnf
# -- SERVER ---------------------------------------------- # [mysqld_safe] pid-file = /var/data/run/mysqld.pid socket = /var/data/run/mysqld.sock nice = 0 flush_caches = 1 numa_interleave = 1 syslog [mysqld] user = mysql pid-file = /var/data/run/mysqld.pid socket = /var/data/run/mysqld.sock port = 3306 basedir = /usr datadir = /var/data/mysql tmpdir = /mnt/tmp lc-messages-dir = /usr/share/mysql skip-external-locking default_time_zone = America/New_York character-set-server = utf8 collation-server = utf8_general_ci transaction-isolation = READ-COMMITTED # -- Cluster Settings -------------------------- # # Path to Galera library wsrep_provider = /usr/lib/libgalera_smm.so # Cluster connection URL contains the IPs of node#1, node#2 and node#3 # It should be empty during bootstrap wsrep_cluster_address=gcomm://10.0.200.8,10.0.210.7 #wsrep_cluster_address=gcomm:// # In order for Galera to work correctly binlog format should be ROW binlog_format = ROW # MyISAM storage engine has only experimental support default_storage_engine = InnoDB # This changes how InnoDB autoincrement locks are managed and is a requirement for Galera innodb_autoinc_lock_mode = 2 # We don't trust auto_increment control from galera when nodes are removed and added # to the cluster, each node has a different offset. wsrep_auto_increment_control = OFF auto_increment_increment = 3 auto_increment_offset = 1 # Node #1 address wsrep_node_address = 10.0.200.7 # Cluster and node name wsrep_cluster_name = db1 wsrep_node_name = db1a # SST method wsrep_sst_method = xtrabackup-v2 wsrep_sst_auth = "db1:XXXXXXXXXXXX" wsrep_sst_receive_address = 10.0.200.7 wsrep_sst_donor = db1b,db1c wsrep_slave_threads = 4 # ---------------------------------------------- # # # * Timeouts # connect_timeout = 5 lock_wait_timeout = 3600 interactive_timeout = 1800 wait_timeout = 3600 # # * Buffer # key_buffer_size = 32M sort_buffer_size = 4M read_rnd_buffer_size = 4M join_buffer_size = 4M max_allowed_packet = 64M thread_stack = 512K thread_cache_size = 12 table_open_cache = 4096 open_files_limit = 65536 max_heap_table_size = 1G tmp_table_size = 1G myisam-recover = BACKUP max_connections = 500 # # * Query Cache Configuration # # Cache size needs to be set to 0 before start with XtrabDB cluster # It can ben changed during runtime # http://www.percona.com/doc/percona-xtradb-cluster/5.6/limitation.html query_cache_type = 1 query_cache_limit = 10M query_cache_size = 0 # # * Logging and Replication # # It has to be logged to FILE to work with XtraDB Cluster # http://www.percona.com/doc/percona-xtradb-cluster/5.6/limitation.html log_output = FILE #general_log_file = /var/log/mysql/mysql.log #general_log = 1 #log_error = /var/log/mysql/error.log slow_query_log = 1 slow_query_log_file = /var/log/mysql/slow.log long_query_time = 10 #log-queries-not-using-indexes # # The following can be used as easy to replay backup logs or for replication. # note: if you are setting up a replication slave, see README.Debian about # other settings you may need to change. server-id = 1 log_bin = /var/data/log/mysql-bin.log expire_logs_days = 10 max_binlog_size = 512M log_bin_trust_function_creators = 1 log-slave-updates innodb_buffer_pool_size = 48G innodb_buffer_pool_instances = 10 innodb_log_file_size = 1G innodb_log_buffer_size = 256M innodb_thread_concurrency = 0 innodb_file_format = Barracuda innodb_flush_method = O_DIRECT innodb_lock_wait_timeout = 60 innodb_read_io_threads = 64 innodb_write_io_threads = 32 innodb_ft_enable_stopword = 0 innodb_ft_min_token_size = 2 innodb_flush_log_at_trx_commit = 0 innodb_open_files = 4096 # NUMA improvement innodb_buffer_pool_populate = 1 innodb_file_per_table # # * Security # ssl-ca = /etc/ssl/certs/CA.crt ssl-cert = /etc/mysql/keys/db1a.crt ssl-key = /etc/mysql/keys/db1a.key
我認為你並不孤單。我們停止了多主複製,而是加強了 RAID 和 XtraBackup,5.5 或 5.6 沒有問題。
我已經讀過,一旦您使用 wsrep 和集群,這些問題就會蔓延。不確定他們是否有解決方案,但您最好的選擇可能是關注此執行緒:
我希望這會有所幫助,並且該文章可能有最好的“來自 Percona 工程師的口”的命令來診斷您的問題。
總結 Percona 工程師的技巧,當然檢查錯誤日誌和:
show status like 'wsrep%';
還有文章中提到的一些工具。我們在 Ansible 劇本中為所有執行數據庫軟體的伺服器提供的一件事是增加最大打開文件限制 (ulimit -n) 等。
我不確定它是否會有所幫助,但來自 Percona 的 Peter 建議在授予權限時明確設置 max_connections。我沒有嘗試過,但這也可能有助於嘗試診斷問題出在哪裡,或者它是否確實與集群有關。
mysql> GRANT USAGE ON *.* TO 'batchjob1'@'localhost' -> WITH MAX_USER_CONNECTIONS 10;
我最近在 Percona XtraDB Cluster 5.6.21-70 上做了一些廣泛的基準測試,並註意到在對集群執行事務(插入、更新、刪除)時,在 32 個執行緒後性能下降。然後我只對 1 個節點執行測試,32 執行緒的降級消失了。不要將您的應用程序指向集群,而是將其指向一個專用節點。我們使用 Linux LVS 進行故障轉移,它執行良好。