Mariadb
如何重啟 MariaDB Galera 集群?
如何重啟 MariaDB Galera 集群?
在所有節點崩潰後,我嘗試恢復集群但沒有成功。我只有 2 個節點。
正如文件所說,我在其中一個節點上設置了一個參數:
set global wsrep_provider_options="pc.bootstrap=true";
然後嘗試啟動第一個節點:
systemctl start mariadb
之後我得到一個錯誤:
Oct 11 16:11:12 proxy1 sh[2367]: 2016-10-11 16:11:12 140291677038720 [Note] /usr/sbin/mysqld (mysqld 10.1.18-MariaDB) starting as process 2402 ... Oct 11 16:11:15 proxy1 sh[2367]: WSREP: Recovered position b6c1dc93-8fa7-11e6-933e-e64cd44e3be0:141 Oct 11 16:11:15 proxy1 mysqld[2434]: 2016-10-11 16:11:15 140047023368320 [Note] /usr/sbin/mysqld (mysqld 10.1.18-MariaDB) starting as process 2434 ... Oct 11 16:11:15 proxy1 mysqld[2434]: 2016-10-11 16:11:15 140047023368320 [Note] WSREP: Read nil XID from storage engines, skipping position init Oct 11 16:11:15 proxy1 mysqld[2434]: 2016-10-11 16:11:15 140047023368320 [Note] WSREP: wsrep_load(): loading provider library '/usr/lib64/galera/libgalera_smm.so' Oct 11 16:11:15 proxy1 mysqld[2434]: 2016-10-11 16:11:15 140047023368320 [Note] WSREP: wsrep_load(): Galera 25.3.18(r3632) by Codership Oy <info@codership.com> loaded successfully. Oct 11 16:11:15 proxy1 mysqld[2434]: 2016-10-11 16:11:15 140047023368320 [Note] WSREP: CRC-32C: using hardware acceleration. Oct 11 16:11:15 proxy1 mysqld[2434]: 2016-10-11 16:11:15 140047023368320 [Note] WSREP: Found saved state: b6c1dc93-8fa7-11e6-933e-e64cd44e3be0:-1 Oct 11 16:11:15 proxy1 mysqld[2434]: 2016-10-11 16:11:15 140047023368320 [Note] WSREP: Passing config to GCS: base_dir = /var/lib/mysql/; base_host = 192.168.0.41; base_port = 4567; cert.log_conflicts = no; debug = no; evs.auto_evict = 0; evs.delay_margin = PT1S; evs.delayed_keep_period = PT30S; evs.inactive_check_period = PT0.5S; evs.inactive_timeout = PT15S; evs.join_retrans_period = PT1S; evs.max_install_timeouts = 3; evs.send_window = 4; evs.stats_report_period = PT1M; evs.suspect_timeout = PT5S; evs.user_send_window = 2; evs.view_forget_timeout = PT24H; gcache.dir = /var/lib/mysql/; gcache.keep_pages_size = 0; gcache.mem_size = 0; gcache.name = /var/lib/mysql//galera.cache; gcache.page_size = 128M; gcache.size = 128M; gcomm.thread_prio = ; gcs.fc_debug = 0; gcs.fc_factor = 1.0; gcs.fc_limit = 16; gcs.fc_master_slave = no; gcs.max_packet_size = 64500; gcs.max_throttle = 0.25; gcs.recv_q_hard_limit = 9223372036854775807; gcs.recv_q_soft_limit = 0.25; gcs.sync_donor = no; gmcast.segment = 0; gmcast.version = 0; pc.announce_timeout = PT3S; pc.checksum = false; pc.ignore_quorum = false; Oct 11 16:11:15 proxy1 mysqld[2434]: 2016-10-11 16:11:15 140046790919936 [Note] WSREP: Service thread queue flushed. Oct 11 16:11:15 proxy1 mysqld[2434]: 2016-10-11 16:11:15 140047023368320 [Note] WSREP: Assign initial position for certification: 141, protocol version: -1 Oct 11 16:11:15 proxy1 mysqld[2434]: 2016-10-11 16:11:15 140047023368320 [Note] WSREP: wsrep_sst_grab() Oct 11 16:11:15 proxy1 mysqld[2434]: 2016-10-11 16:11:15 140047023368320 [Note] WSREP: Start replication Oct 11 16:11:15 proxy1 mysqld[2434]: 2016-10-11 16:11:15 140047023368320 [Note] WSREP: Setting initial position to b6c1dc93-8fa7-11e6-933e-e64cd44e3be0:141 Oct 11 16:11:15 proxy1 mysqld[2434]: 2016-10-11 16:11:15 140047023368320 [Note] WSREP: protonet asio version 0 Oct 11 16:11:15 proxy1 mysqld[2434]: 2016-10-11 16:11:15 140047023368320 [Note] WSREP: Using CRC-32C for message checksums. Oct 11 16:11:15 proxy1 mysqld[2434]: 2016-10-11 16:11:15 140047023368320 [Note] WSREP: backend: asio Oct 11 16:11:15 proxy1 mysqld[2434]: 2016-10-11 16:11:15 140047023368320 [Note] WSREP: gcomm thread scheduling priority set to other:0 Oct 11 16:11:15 proxy1 mysqld[2434]: 2016-10-11 16:11:15 140047023368320 [Warning] WSREP: access file(/var/lib/mysql//gvwstate.dat) failed(No such file or directory) Oct 11 16:11:15 proxy1 mysqld[2434]: 2016-10-11 16:11:15 140047023368320 [Note] WSREP: restore pc from disk failed Oct 11 16:11:15 proxy1 mysqld[2434]: 2016-10-11 16:11:15 140047023368320 [Note] WSREP: GMCast version 0 Oct 11 16:11:15 proxy1 mysqld[2434]: 2016-10-11 16:11:15 140047023368320 [Note] WSREP: (30a7b2e6, 'tcp://0.0.0.0:4567') listening at tcp://0.0.0.0:4567 Oct 11 16:11:15 proxy1 mysqld[2434]: 2016-10-11 16:11:15 140047023368320 [Note] WSREP: (30a7b2e6, 'tcp://0.0.0.0:4567') multicast: , ttl: 1 Oct 11 16:11:15 proxy1 mysqld[2434]: 2016-10-11 16:11:15 140047023368320 [Note] WSREP: EVS version 0 Oct 11 16:11:15 proxy1 mysqld[2434]: 2016-10-11 16:11:15 140047023368320 [Note] WSREP: gcomm: connecting to group 'test_cluster', peer '192.168.0.41:,192.168.0.42:' Oct 11 16:11:15 proxy1 mysqld[2434]: 2016-10-11 16:11:15 140047023368320 [Note] WSREP: (30a7b2e6, 'tcp://0.0.0.0:4567') connection established to 30a7b2e6 tcp://192.168.0.41:4567 Oct 11 16:11:15 proxy1 mysqld[2434]: 2016-10-11 16:11:15 140047023368320 [Warning] WSREP: (30a7b2e6, 'tcp://0.0.0.0:4567') address 'tcp://192.168.0.41:4567' points to own listening address, blacklisting Oct 11 16:11:15 proxy1 mysqld[2434]: 2016-10-11 16:11:15 140047023368320 [Note] WSREP: (30a7b2e6, 'tcp://0.0.0.0:4567') connection established to 30a7b2e6 tcp://192.168.0.41:4567 Oct 11 16:11:15 proxy1 mysqld[2434]: 2016-10-11 16:11:15 140047023368320 [Note] WSREP: (30a7b2e6, 'tcp://0.0.0.0:4567') connection established to 1ef15511 tcp://192.168.0.42:4567 Oct 11 16:11:15 proxy1 mysqld[2434]: 2016-10-11 16:11:15 140047023368320 [Note] WSREP: (30a7b2e6, 'tcp://0.0.0.0:4567') turning message relay requesting on, nonlive peers: Oct 11 16:11:15 proxy1 mysqld[2434]: 2016-10-11 16:11:15 140047023368320 [Note] WSREP: declaring 1ef15511 at tcp://192.168.0.42:4567 stable Oct 11 16:11:15 proxy1 mysqld[2434]: 2016-10-11 16:11:15 140047023368320 [Warning] WSREP: no nodes coming from prim view, prim not possible Oct 11 16:11:15 proxy1 mysqld[2434]: 2016-10-11 16:11:15 140047023368320 [Note] WSREP: view(view_id(NON_PRIM,1ef15511,2) memb { Oct 11 16:11:15 proxy1 mysqld[2434]: 1ef15511,0 Oct 11 16:11:15 proxy1 mysqld[2434]: 30a7b2e6,0 Oct 11 16:11:15 proxy1 mysqld[2434]: } joined { Oct 11 16:11:15 proxy1 mysqld[2434]: } left { Oct 11 16:11:15 proxy1 mysqld[2434]: } partitioned { Oct 11 16:11:15 proxy1 mysqld[2434]: }) Oct 11 16:11:18 proxy1 mysqld[2434]: 2016-10-11 16:11:18 140047023368320 [Note] WSREP: (30a7b2e6, 'tcp://0.0.0.0:4567') turning message relay requesting off Oct 11 16:11:19 proxy1 mysqld[2434]: 2016-10-11 16:11:19 140047023368320 [Note] WSREP: (30a7b2e6, 'tcp://0.0.0.0:4567') turning message relay requesting on, nonlive peers: tcp://192.168.0.42:4567 Oct 11 16:11:20 proxy1 mysqld[2434]: 2016-10-11 16:11:20 140047023368320 [Note] WSREP: forgetting 1ef15511 (tcp://192.168.0.42:4567) Oct 11 16:11:20 proxy1 mysqld[2434]: 2016-10-11 16:11:20 140047023368320 [Note] WSREP: (30a7b2e6, 'tcp://0.0.0.0:4567') turning message relay requesting off Oct 11 16:11:20 proxy1 mysqld[2434]: 2016-10-11 16:11:20 140047023368320 [Warning] WSREP: no nodes coming from prim view, prim not possible Oct 11 16:11:20 proxy1 mysqld[2434]: 2016-10-11 16:11:20 140047023368320 [Note] WSREP: view(view_id(NON_PRIM,30a7b2e6,3) memb { Oct 11 16:11:20 proxy1 mysqld[2434]: 30a7b2e6,0 Oct 11 16:11:20 proxy1 mysqld[2434]: } joined { Oct 11 16:11:20 proxy1 mysqld[2434]: } left { Oct 11 16:11:20 proxy1 mysqld[2434]: } partitioned { Oct 11 16:11:20 proxy1 mysqld[2434]: 1ef15511,0 Oct 11 16:11:20 proxy1 mysqld[2434]: }) Oct 11 16:11:25 proxy1 mysqld[2434]: 2016-10-11 16:11:25 140047023368320 [Note] WSREP: cleaning up 1ef15511 (tcp://192.168.0.42:4567) Oct 11 16:11:46 proxy1 mysqld[2434]: 2016-10-11 16:11:46 140047023368320 [Note] WSREP: view((empty)) Oct 11 16:11:46 proxy1 mysqld[2434]: 2016-10-11 16:11:46 140047023368320 [ERROR] WSREP: failed to open gcomm backend connection: 110: failed to reach primary view: 110 (Connection timed out) Oct 11 16:11:46 proxy1 mysqld[2434]: at gcomm/src/pc.cpp:connect():162 Oct 11 16:11:46 proxy1 mysqld[2434]: 2016-10-11 16:11:46 140047023368320 [ERROR] WSREP: gcs/src/gcs_core.cpp:gcs_core_open():208: Failed to open backend connection: -110 (Connection timed out) Oct 11 16:11:46 proxy1 mysqld[2434]: 2016-10-11 16:11:46 140047023368320 [ERROR] WSREP: gcs/src/gcs.cpp:gcs_open():1380: Failed to open channel 'test_cluster' at 'gcomm://192.168.0.41,192.168.0.42': -110 (Connection timed out) Oct 11 16:11:46 proxy1 mysqld[2434]: 2016-10-11 16:11:46 140047023368320 [ERROR] WSREP: gcs connect failed: Connection timed out Oct 11 16:11:46 proxy1 mysqld[2434]: 2016-10-11 16:11:46 140047023368320 [ERROR] WSREP: wsrep::connect(gcomm://192.168.0.41,192.168.0.42) failed: 7 Oct 11 16:11:46 proxy1 mysqld[2434]: 2016-10-11 16:11:46 140047023368320 [ERROR] Aborting Oct 11 16:11:47 proxy1 systemd[1]: mariadb.service: main process exited, code=exited, status=1/FAILURE Oct 11 16:11:47 proxy1 systemd[1]: Failed to start MariaDB database server. -- Subject: Unit mariadb.service has failed -- Defined-By: systemd -- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel -- -- Unit mariadb.service has failed. -- -- The result is failed. Oct 11 16:11:47 proxy1 systemd[1]: Unit mariadb.service entered failed state. Oct 11 16:11:47 proxy1 systemd[1]: mariadb.service failed. Oct 11 16:11:47 proxy1 polkitd[570]: Unregistered Authentication Agent for unix-process:2360:148848 (system bus name :1.15, object path /org/freedesktop/PolicyKit1/AuthenticationAgent, locale en_US.UTF-8) (disconnected from bus)
如何恢復集群?
MariaDB Galera 集群:
解決方案 1:
1)我已將
safe_to_bootstrap
參數更改1
為文件中的一個節點/var/lib/mysql/grastate.dat
:safe_to_bootstrap: 1
2)之後我殺死了所有的mysql程序:
killall -KILL mysql mysqld_safe mysqld mysql-systemd
3)並啟動了一個新集群:
galera_new_cluster
4)我重新連接到新節點的所有其他節點:
systemctl restart mariadb
PS 在 CentOS 上安裝 killall 使用
psmisc
:sudo yum install psmisc
解決方案 2:
另一種重啟 MariaDB Galera 集群的方法是使用
--wsrep-new-cluster
參數。1)殺死所有mysql程序:
killall -KILL mysql mysqld_safe mysqld mysql-systemd
2)在最新的節點上啟動一個新集群:
/etc/init.d/mysql start --wsrep-new-cluster
3)現在可以連接其他節點:
service mysql start --wsrep_cluster_address="gcomm://192.168.0.101,192.168.0.102,192.168.0.103" \ --wsrep_cluster_name="my_cluster"
Percona XtraDB Cluster:
解決方案 1:
如果您可以連接到最新的節點,那麼您可以設置節點以使用下一個 SQL 進行引導:
SET GLOBAL wsrep_provider_options='pc.bootstrap=true';
解決方案2:
如果您的所有節點都死了並且無法啟動,您可以停止舊的集群並執行新的集群。您必須停止所有集群節點,因為它們具有關於舊集群中的舊節點的資訊。
1)殺死所有節點上的所有mysql程序:
killall -KILL mysql mysqld_safe mysqld mysql-systemd
2)在最新的節點上啟動一個新集群:
systemctl start mysql@bootstrap.service
3)啟動其他節點:
systemctl start mysql