Galera

MariaDB 10.1 Galera 集群錯誤

  • September 20, 2018

我正在嘗試使用 2 個節點安裝 MariaDB Galera Cluster:

節點 1/ 172.23.0.2 :

wsrep_on=ON
wsrep_provider=/usr/lib64/galera/libgalera_smm.so
binlog_format=ROW
wsrep_cluster_address='gcomm://'
wsrep_sst_receive_address = '172.23.0.2:4444'
wsrep_cluster_name='cluster'
wsrep_node_name='n_01'
wsrep_sst_method=rsync
wsrep_sst_auth=cluster_user:cluster_pass

節點 2/ 172.23.0.3 :

wsrep_on=ON
wsrep_provider=/usr/lib64/galera/libgalera_smm.so
binlog_format=ROW
wsrep_cluster_address='gcomm://172.23.0.2'
wsrep_sst_receive_address = '172.23.0.3:4444'
wsrep_cluster_name='cluster'
wsrep_node_name='n_02'
wsrep_sst_method=rsync
wsrep_sst_auth=cluster_user:cluster_pass

第一個節點啟動時沒有錯誤:

Variable_name         Value    
--------------------  ---------
wsrep_cluster_size    1        
wsrep_cluster_status  Primary  
wsrep_connected       ON       
wsrep_ready           ON    

但是當我啟動 2n 節點時,我得到了這個:

mariadb.service - MariaDB database server
  Loaded: loaded (/usr/lib/systemd/system/mariadb.service; enabled; vendor preset: disabled)
 Drop-In: /etc/systemd/system/mariadb.service.d
          └─migrated-from-my.cnf-settings.conf
  Active: failed (Result: exit-code) since jeu. 2017-08-24 19:11:32 CEST; 14s ago
 Process: 14656 ExecStartPost=/bin/sh -c systemctl unset-environment _WSREP_START_POSITION (code=exited, status=0/SUCCESS)
 Process: 20222 ExecStart=/usr/sbin/mysqld $MYSQLD_OPTS $_WSREP_NEW_CLUSTER $_WSREP_START_POSITION (code=exited, status=1/FAILURE)
 Process: 18861 ExecStartPre=/bin/sh -c [ ! -e /usr/bin/galera_recovery ] && VAR= ||   VAR=`/usr/bin/galera_recovery`; [ $? -eq 0 ]   && systemctl set-environment _WSREP_START_POSITION=$VAR || exit 1 (code=exited, status=0/SUCCESS)
 Process: 18858 ExecStartPre=/bin/sh -c systemctl unset-environment _WSREP_START_POSITION (code=exited, status=0/SUCCESS)
Main PID: 20222 (code=exited, status=1/FAILURE)
  Status: "MariaDB server is down"
  CGroup: /system.slice/mariadb.service
          ├─20357 /bin/bash -ue /usr//bin/wsrep_sst_rsync --role joiner --address 172.23.0.3 --datadir /home/mysql/ --parent 20309 --binlog /var/log/mariadb/binlog/mysql_binlog
          ├─20391 rsync --daemon --no-detach --port 4444 --config /home/mysql//rsync_sst.conf
          ├─22006 /bin/bash -ue /usr//bin/wsrep_sst_rsync --role joiner --address 172.23.0.3 --datadir /home/mysql/ --parent 21997 --binlog /var/log/mariadb/binlog/mysql_binlog
          ├─22638 sleep 0.2
          └─22648 sleep 0.2

août 24 19:11:23 ovh38 mysqld[20222]: 2017-08-24 19:11:23 127079663351552 [ERROR] WSREP: Failed to read 'ready <addr>' from: wsrep_sst_rsync --role 'joiner' --address '172.23.0.3:4444' --datadir '/home/mysql/'   --pa...log/mysql_binlog'
août 24 19:11:23 ovh38 mysqld[20222]: Read: '(null)'
août 24 19:11:23 ovh38 mysqld[20222]: 2017-08-24 19:11:23 127079663351552 [ERROR] WSREP: Process completed with error: wsrep_sst_rsync --role 'joiner' --address '172.23.0.3:4444' --datadir '/home/mysql/'   --parent '...eady in progress)
août 24 19:11:23 ovh38 mysqld[20222]: 2017-08-24 19:11:23 127080155712256 [ERROR] WSREP: Failed to prepare for 'rsync' SST. Unrecoverable.
août 24 19:11:23 ovh38 mysqld[20222]: 2017-08-24 19:11:23 127080155712256 [ERROR] Aborting
août 24 19:11:32 ovh38 mysqld[20222]: Error in my_thread_global_end(): 1 threads didn't exit
août 24 19:11:32 ovh38 systemd[1]: mariadb.service: main process exited, code=exited, status=1/FAILURE
août 24 19:11:32 ovh38 systemd[1]: Failed to start MariaDB database server.
août 24 19:11:32 ovh38 systemd[1]: Unit mariadb.service entered failed state.
août 24 19:11:32 ovh38 systemd[1]: mariadb.service failed.
Hint: Some lines were ellipsized, use -l to show in full.

更新 :

此錯誤的來源是由於 rsync 程序已在使用中,因此解決方案是終止它:

Proto Recv-Q Send-Q Adresse locale          Adresse distante        Etat        PID/Program name
tcp        0      0 0.0.0.0:21              0.0.0.0:*               LISTEN      1087/proftpd: (acce
tcp        0      0 0.0.0.0:22              0.0.0.0:*               LISTEN      708/sshd
tcp        0      0 0.0.0.0:4444            0.0.0.0:*               LISTEN      15510/rsync
tcp6       0      0 :::80                   :::*                    LISTEN      19059/httpd
tcp6       0      0 :::22                   :::*                    LISTEN      708/sshd
tcp6       0      0 :::443                  :::*                    LISTEN      19059/httpd
tcp6       0      0 :::4444                 :::*                    LISTEN      15510/rsync
tcp6       0      0 :::545                  :::*                    LISTEN      19059/httpd


#kill -9  15510

我嘗試重新啟動第二個節點: systemctl start mariadb 在第一個節點中: SHOW STATUS LIKE 'wsrep_cluster%'

Variable_name             Value                                 
------------------------  --------------------------------------
wsrep_cluster_conf_id     2                                     
wsrep_cluster_size        2                                     
wsrep_cluster_state_uuid  00edfa0e-88d5-11e7-8f43-5ea901e83b3a  
wsrep_cluster_status      Primary  

然而,又出現了一個錯誤:

# systemctl status mariadb.service
● mariadb.service - MariaDB database server
  Loaded: loaded (/usr/lib/systemd/system/mariadb.service; enabled; vendor preset: disabled)
 Drop-In: /etc/systemd/system/mariadb.service.d
          └─migrated-from-my.cnf-settings.conf
  Active: failed (Result: timeout) since ven. 2017-08-25 10:42:09 CEST; 11min ago
 Process: 14656 ExecStartPost=/bin/sh -c systemctl unset-environment _WSREP_START_POSITION (code=exited, status=0/SUCCESS)
 Process: 12697 ExecStartPre=/bin/sh -c [ ! -e /usr/bin/galera_recovery ] && VAR= ||   VAR=`/usr/bin/galera_recovery`; [ $? -eq 0 ]   && systemctl set-environment _WSREP_START_POSITION=$VAR || exit 1 (code=exited, status=0/SUCCESS)
 Process: 12685 ExecStartPre=/bin/sh -c systemctl unset-environment _WSREP_START_POSITION (code=exited, status=0/SUCCESS)
Main PID: 15310
  CGroup: /system.slice/mariadb.service
          ├─15310 /usr/sbin/mysqld --wsrep_start_position=00000000-0000-0000-0000-000000000000:-1
          ├─15468 /bin/bash -ue /usr//bin/wsrep_sst_rsync --role joiner --address 172.23.0.3 --datadir /home/mysql/ --parent 15310 --binlog /var/log/mariadb/binlog/mysql_binlog
          ├─15510 rsync --daemon --no-detach --port 4444 --config /home/mysql//rsync_sst.conf
          ├─15980 /bin/bash -ue /usr//bin/wsrep_sst_rsync --role joiner --address 172.23.0.3 --datadir /home/mysql/ --parent 15901 --binlog /var/log/mariadb/binlog/mysql_binlog
          ├─18646 sleep 0.2
          ├─18670 sleep 0.2
          ├─18675 sleep 0.2
          ├─18676 sleep 0.2
          ├─18686 sleep 0.2
          ├─20357 /bin/bash -ue /usr//bin/wsrep_sst_rsync --role joiner --address 172.23.0.3 --datadir /home/mysql/ --parent 20309 --binlog /var/log/mariadb/binlog/mysql_binlog
          ├─22006 /bin/bash -ue /usr//bin/wsrep_sst_rsync --role joiner --address 172.23.0.3 --datadir /home/mysql/ --parent 21997 --binlog /var/log/mariadb/binlog/mysql_binlog
          └─23982 /bin/bash -ue /usr//bin/wsrep_sst_rsync --role joiner --address 172.23.0.3 --datadir /home/mysql/ --parent 23794 --binlog /var/log/mariadb/binlog/mysql_binlog

août 25 10:39:10 ovh38 mysqld[15310]: 2017-08-25 10:39:10 115191544425216 [Note] WSREP: New cluster view: global state: 00edfa0e-88d5-11e7-8f43-5ea901e83b3a:0, view# 2: Primary, number of nodes: 2, my index: 0, protocol version 3
août 25 10:39:10 ovh38 mysqld[15310]: 2017-08-25 10:39:10 115191544425216 [Warning] WSREP: Gap in state sequence. Need state transfer.
août 25 10:39:10 ovh38 mysqld[15310]: 2017-08-25 10:39:10 115190975993600 [Note] WSREP: Running: 'wsrep_sst_rsync --role 'joiner' --address '172.23.0.3' --datadir '/home/mysql/'   --parent '15310' --binlog '/var/log/...g/mysql_binlog' '
août 25 10:39:10 ovh38 rsyncd[15510]: rsyncd version 3.0.9 starting, listening on port 4444
août 25 10:39:13 ovh38 mysqld[15310]: 2017-08-25 10:39:13 115191014389504 [Note] WSREP: (42fd49d1, 'tcp://0.0.0.0:4567') turning message relay requesting off
août 25 10:40:39 ovh38 systemd[1]: mariadb.service start operation timed out. Terminating.
août 25 10:42:09 ovh38 systemd[1]: mariadb.service stop-final-sigterm timed out. Skipping SIGKILL. Entering failed mode.
août 25 10:42:09 ovh38 systemd[1]: Failed to start MariaDB database server.
août 25 10:42:09 ovh38 systemd[1]: Unit mariadb.service entered failed state.
août 25 10:42:09 ovh38 systemd[1]: mariadb.service failed.
Hint: Some lines were ellipsized, use -l to show in full.

任何想法來解決這個問題?

您必須在每個節點的配置文件中指定所有節點的 IP 地址:

wsrep_cluster_address="gcomm://IP.node1,IP.node2,IP.node3"

請參考 Galera 文件

此外,為了避免腦裂,您應該添加第三個節點或仲裁者。

引用自:https://dba.stackexchange.com/questions/184283