Mysql

Percona XtraDB - 隨機死鎖

  • July 20, 2016

隨機我的數據庫被完全鎖定並且查詢掛起並且它們堆積起來直到它得到“太多的連接”。

下面是我的 mysql 日誌文件,我找不到與錯誤系統呼叫相關的任何內容。

Aug 28 14:06:36 db1a mysqld: 2015-08-28 14:06:36 11786 [Note] WSREP: Created page /var/data/mysql/gcache.page.000008 of size 134217728 bytes
Aug 28 14:07:06 db1a mysqld: 2015-08-28 14:07:06 11786 [Note] WSREP: Deleted page /var/data/mysql/gcache.page.000008
Aug 28 16:39:38 db1a mysqld: 2015-08-28 16:39:38 11786 [Note] WSREP: Created page /var/data/mysql/gcache.page.000009 of size 134217728 bytes
Aug 28 16:39:48 db1a mysqld: 2015-08-28 16:39:48 11786 [Note] WSREP: Deleted page /var/data/mysql/gcache.page.000009
Aug 28 19:42:07 db1a mysqld: 2015-08-28 19:42:07 11786 [Note] WSREP: Created page /var/data/mysql/gcache.page.000010 of size 134217728 bytes
Aug 28 19:42:08 db1a mysqld: 2015-08-28 19:42:08 11786 [Note] WSREP: Created page /var/data/mysql/gcache.page.000011 of size 134217728 bytes
Aug 28 19:42:10 db1a mysqld: 2015-08-28 19:42:10 11786 [Warning] WSREP: Failed to report last committed 758795619, -4 (Interrupted system call)
Aug 28 19:42:45 db1a mysqld: 2015-08-28 19:42:45 11786 [Warning] WSREP: Failed to report last committed 758795879, -4 (Interrupted system call)
Aug 28 19:43:07 db1a mysqld: 2015-08-28 19:43:07 11786 [Warning] WSREP: Failed to report last committed 758796011, -4 (Interrupted system call)
Aug 28 19:43:11 db1a mysqld: 2015-08-28 19:43:11 11786 [Warning] WSREP: Failed to report last committed 758796012, -4 (Interrupted system call)
Aug 28 19:43:49 db1a mysqld: 2015-08-28 19:43:49 11786 [Warning] Too many connections
Aug 28 19:43:49 db1a mysqld: 2015-08-28 19:43:49 11786 [Warning] Too many connections
Aug 28 19:43:50 db1a mysqld: 2015-08-28 19:43:50 11786 [Warning] Too many connections
Aug 28 19:43:51 db1a mysqld: 2015-08-28 19:43:51 11786 [Warning] Too many connections

MySQL 開始在我的應用程序中拋出此錯誤:

'SQLSTATE[HY000]: General error: 1205 Lock wait timeout exceeded; try restarting transaction

版本:5.6.24-72.2-56-log Percona XtraDB Cluster (GPL),版本 rel72.2,修訂版 43abf03,WSREP 版本 25.11,wsrep_25.11

我的.cnf

# -- SERVER ---------------------------------------------- #

[mysqld_safe]

pid-file        = /var/data/run/mysqld.pid
socket          = /var/data/run/mysqld.sock
nice            = 0 
flush_caches    = 1 
numa_interleave = 1 
syslog

[mysqld]

user        = mysql
pid-file    = /var/data/run/mysqld.pid
socket      = /var/data/run/mysqld.sock
port        = 3306
basedir     = /usr
datadir     = /var/data/mysql
tmpdir      = /mnt/tmp

lc-messages-dir = /usr/share/mysql
skip-external-locking

default_time_zone = America/New_York

character-set-server    = utf8
collation-server        = utf8_general_ci

transaction-isolation   = READ-COMMITTED

# -- Cluster Settings -------------------------- #

# Path to Galera library
wsrep_provider = /usr/lib/libgalera_smm.so

# Cluster connection URL contains the IPs of node#1, node#2 and node#3
# It should be empty during bootstrap
wsrep_cluster_address=gcomm://10.0.200.8,10.0.210.7
#wsrep_cluster_address=gcomm://

# In order for Galera to work correctly binlog format should be ROW
binlog_format = ROW

# MyISAM storage engine has only experimental support
default_storage_engine = InnoDB

# This changes how InnoDB autoincrement locks are managed and is a requirement for Galera
innodb_autoinc_lock_mode = 2

# We don't trust auto_increment control from galera when nodes are removed and added
# to the cluster, each node has a different offset.
wsrep_auto_increment_control = OFF
auto_increment_increment = 3
auto_increment_offset    = 1

# Node #1 address
wsrep_node_address = 10.0.200.7

# Cluster and node name
wsrep_cluster_name = db1
wsrep_node_name = db1a

# SST method
wsrep_sst_method            = xtrabackup-v2
wsrep_sst_auth              = "db1:XXXXXXXXXXXX"
wsrep_sst_receive_address   = 10.0.200.7
wsrep_sst_donor             = db1b,db1c

wsrep_slave_threads = 4

# ---------------------------------------------- #

# 
# * Timeouts
#
connect_timeout          = 5
lock_wait_timeout        = 3600
interactive_timeout      = 1800
wait_timeout             = 3600

#
# * Buffer
#
key_buffer_size      = 32M
sort_buffer_size     = 4M
read_rnd_buffer_size = 4M
join_buffer_size     = 4M

max_allowed_packet   = 64M
thread_stack         = 512K
thread_cache_size    = 12

table_open_cache     = 4096
open_files_limit     = 65536
max_heap_table_size  = 1G
tmp_table_size       = 1G

myisam-recover       = BACKUP
max_connections      = 500

#
# * Query Cache Configuration
#
# Cache size needs to be set to 0 before start with XtrabDB cluster
# It can ben changed during runtime
# http://www.percona.com/doc/percona-xtradb-cluster/5.6/limitation.html

query_cache_type  = 1
query_cache_limit = 10M
query_cache_size  = 0

#
# * Logging and Replication
#

# It has to be logged to FILE to work with XtraDB Cluster
# http://www.percona.com/doc/percona-xtradb-cluster/5.6/limitation.html
log_output = FILE

#general_log_file        = /var/log/mysql/mysql.log
#general_log             = 1

#log_error = /var/log/mysql/error.log

slow_query_log      = 1
slow_query_log_file = /var/log/mysql/slow.log
long_query_time     = 10
#log-queries-not-using-indexes

#
# The following can be used as easy to replay backup logs or for replication.
# note: if you are setting up a replication slave, see README.Debian about
#       other settings you may need to change.

server-id           = 1
log_bin             = /var/data/log/mysql-bin.log
expire_logs_days    = 10
max_binlog_size     = 512M
log_bin_trust_function_creators = 1
log-slave-updates

innodb_buffer_pool_size         = 48G
innodb_buffer_pool_instances    = 10
innodb_log_file_size            = 1G
innodb_log_buffer_size          = 256M
innodb_thread_concurrency       = 0
innodb_file_format              = Barracuda
innodb_flush_method             = O_DIRECT

innodb_lock_wait_timeout        = 60

innodb_read_io_threads          = 64
innodb_write_io_threads         = 32

innodb_ft_enable_stopword       = 0
innodb_ft_min_token_size        = 2

innodb_flush_log_at_trx_commit  = 0

innodb_open_files               = 4096

# NUMA improvement
innodb_buffer_pool_populate     = 1

innodb_file_per_table

#
# * Security
#
ssl-ca    = /etc/ssl/certs/CA.crt
ssl-cert  = /etc/mysql/keys/db1a.crt
ssl-key   = /etc/mysql/keys/db1a.key

我認為你並不孤單。我們停止了多主複製,而是加強了 RAID 和 XtraBackup,5.5 或 5.6 沒有問題。

我已經讀過,一旦您使用 wsrep 和集群,這些問題就會蔓延。不確定他們是否有解決方案,但您最好的選擇可能是關注此執行緒:

我希望這會有所幫助,並且該文章可能有最好的“來自 Percona 工程師的口”的命令來診斷您的問題。

總結 Percona 工程師的技巧,當然檢查錯誤日誌和:

show status like 'wsrep%';

還有文章中提到的一些工具。我們在 Ansible 劇本中為所有執行數據庫軟體的伺服器提供的一件事是增加最大打開文件限制 (ulimit -n) 等。

我不確定它是否會有所幫助,但來自 Percona 的 Peter 建議在授予權限時明確設置 max_connections。我沒有嘗試過,但這也可能有助於嘗試診斷問題出在哪裡,或者它是否確實與集群有關。

mysql> GRANT USAGE ON *.* TO 'batchjob1'@'localhost'
   ->     WITH MAX_USER_CONNECTIONS 10;

我最近在 Percona XtraDB Cluster 5.6.21-70 上做了一些廣泛的基準測試,並註意到在對集群執行事務(插入、更新、刪除)時,在 32 個執行緒後性能下降。然後我只對 1 個節點執行測試,32 執行緒的降級消失了。不要將您的應用程序指向集群,而是將其指向一個專用節點。我們使用 Linux LVS 進行故障轉移,它執行良好。

引用自:https://dba.stackexchange.com/questions/112582