Mysql
Percona - 重新啟動後無法啟動 mysql
我剛剛在 ubuntu 中配置了一個帶有 2 個節點的 percona xtradb 集群。
這就是我所做的。
- 推出 2 台新伺服器。
- 在兩個節點上都安裝了 percona 集群。
- 在兩個節點上都停止了 mysql。
在伺服器 1 上:
/etc/mysql/percona-xtradb-cluster.conf.d/wsrep.cnf wsrep_cluster_address=gcomm://172.31.5.172,172.31.13.81 binlog_format=ROW wsrep_slave_threads= 8 wsrep_log_conflicts innodb_autoinc_lock_mode=2 wsrep_node_address=172.31.5.172 wsrep_cluster_name=pxc-cluster pxc_strict_mode=DISABLED wsrep_sst_method=xtrabackup-v2 wsrep_sst_auth="sstuser:s3cretPass"
在伺服器 2 上:
/etc/mysql/percona-xtradb-cluster.conf.d/wsrep.cnf wsrep_cluster_address=gcomm://172.31.5.172,172.31.13.81 binlog_format=ROW wsrep_slave_threads= 8 wsrep_log_conflicts innodb_autoinc_lock_mode=2 wsrep_node_address=172.31.13.81 wsrep_cluster_name=pxc-cluster pxc_strict_mode=DISABLED wsrep_sst_method=xtrabackup-v2 wsrep_sst_auth="sstuser:s3cretPass"
在 server1 上:
/etc/init.d/mysql bootstrap-pxc
然後 mysql 啟動。CREATE USER 'sstuser'@'localhost' IDENTIFIED BY 's3cretPass'; GRANT PROCESS, RELOAD, LOCK TABLES, REPLICATION CLIENT ON *.* TO 'sstuser'@'localhost'; FLUSH PRIVILEGES;
在伺服器 2 上:
service mysql start
截至目前沒有問題,測試了複製素。然後我重新啟動了兩個節點。
之後mysql將無法啟動。
伺服器1:
2017-11-05T17:22:03.505946Z mysqld_safe Logging to '/var/log/mysqld.log'. 2017-11-05T17:22:03.515607Z mysqld_safe Starting mysqld daemon with databases from /var/lib/mysql 2017-11-05T17:22:03.520623Z mysqld_safe Skipping wsrep-recover for cc6a7fb6-c24b-11e7-bd13-ba1518e34f61:0 pair 2017-11-05T17:22:03.521158Z mysqld_safe Assigning cc6a7fb6-c24b-11e7-bd13-ba1518e34f61:0 to wsrep_start_position 2017-11-05T17:22:03.706708Z 0 [Warning] TIMESTAMP with implicit DEFAULT value is deprecated. Please use --explicit_defaults_for_timestamp server option (see documentation for more details). 2017-11-05T17:22:03.708134Z 0 [Note] /usr/sbin/mysqld (mysqld 5.7.19-17-57-log) starting as process 11089 ... 2017-11-05T17:22:03.709661Z 0 [Warning] No argument was provided to --log-bin, and --log-bin-index was not used; so replication may break when this MySQL server acts as a master and has his hostname changed!! Please use '--log-bin=ip-172-31-5-172-bin' to avoid this problem. 2017-11-05T17:22:03.709964Z 0 [Note] WSREP: Setting wsrep_ready to false 2017-11-05T17:22:03.709977Z 0 [Note] WSREP: No pre-stored wsrep-start position found. Skipping position initialization. 2017-11-05T17:22:03.709982Z 0 [Note] WSREP: wsrep_load(): loading provider library '/usr/lib/galera3/libgalera_smm.so' 2017-11-05T17:22:03.712249Z 0 [Note] WSREP: wsrep_load(): Galera 3.22(r8678538) by Codership Oy <info@codership.com> loaded successfully. 2017-11-05T17:22:03.712290Z 0 [Note] WSREP: CRC-32C: using hardware acceleration. 2017-11-05T17:22:03.712549Z 0 [Note] WSREP: Found saved state: cc6a7fb6-c24b-11e7-bd13-ba1518e34f61:0, safe_to_bootsrap: 1 2017-11-05T17:22:03.713823Z 0 [Note] WSREP: Passing config to GCS: base_dir = /var/lib/mysql/; base_host = 172.31.5.172; base_port = 4567; cert.log_conflicts = no; debug = no; evs.auto_evict = 0; evs.delay_margin = PT1S; evs.delayed_keep_period = PT30S; evs.inactive_check_period = PT0.5S; evs.inactive_timeout = PT15S; evs.join_retrans_period = PT1S; evs.max_install_timeouts = 3; evs.send_window = 10; evs.stats_report_period = PT1M; evs.suspect_timeout = PT5S; evs.user_send_window = 4; evs.view_forget_timeout = PT24H; gcache.dir = /var/lib/mysql/; gcache.keep_pages_count = 0; gcache.keep_pages_size = 0; gcache.mem_size = 0; gcache.name = /var/lib/mysql//galera.cache; gcache.page_size = 128M; gcache.recover = no; gcache.size = 128M; gcomm.thread_prio = ; gcs.fc_debug = 0; gcs.fc_factor = 1; gcs.fc_limit = 100; gcs.fc_master_slave = no; gcs.max_packet_size = 64500; gcs.max_throttle = 0.25; gcs.recv_q_hard_limit = 9223372036854775807; gcs.recv_q_soft_limit = 0.25; gcs.sync_donor = no; gmcast.segment = 0; gmcast.version = 0; pc.announce_timeout = PT3S; pc.checksum = false; pc.ignore_quorum = false; pc.ignore_sb = false; pc.npvo = false; pc.recovery = 1; pc.version = 0; pc.wait_prim = true; pc.wait_prim_timeout = PT30S; pc.weight = 1; protonet.backend = asio; protonet.version = 0; repl.causal_read_timeout = PT30S; repl.commit_order = 3; repl.key_format = FLAT8; repl.max_ws_size = 2147483647; repl.proto_max = 7; socket.checksum = 2; socket.recv_buf_size = 212992; 2017-11-05T17:22:03.727615Z 0 [Note] WSREP: Assign initial position for certification: 0, protocol version: -1 2017-11-05T17:22:03.727630Z 0 [Note] WSREP: Preparing to initiate SST/IST 2017-11-05T17:22:03.727634Z 0 [Note] WSREP: Starting replication 2017-11-05T17:22:03.727640Z 0 [Note] WSREP: Setting initial position to cc6a7fb6-c24b-11e7-bd13-ba1518e34f61:0 2017-11-05T17:22:03.727741Z 0 [Note] WSREP: Using CRC-32C for message checksums. 2017-11-05T17:22:03.727792Z 0 [Note] WSREP: gcomm thread scheduling priority set to other:0 2017-11-05T17:22:03.727852Z 0 [Warning] WSREP: Fail to access the file (/var/lib/mysql//gvwstate.dat) error (No such file or directory). It is possible if node is booting for first time or re-booting after a graceful shutdown 2017-11-05T17:22:03.727859Z 0 [Note] WSREP: Restoring primary-component from disk failed. Either node is booting for first time or re-booting after a graceful shutdown 2017-11-05T17:22:03.728266Z 0 [Note] WSREP: GMCast version 0 2017-11-05T17:22:03.728398Z 0 [Note] WSREP: (d73a400b, 'tcp://0.0.0.0:4567') listening at tcp://0.0.0.0:4567 2017-11-05T17:22:03.728405Z 0 [Note] WSREP: (d73a400b, 'tcp://0.0.0.0:4567') multicast: , ttl: 1 2017-11-05T17:22:03.728686Z 0 [Note] WSREP: EVS version 0 2017-11-05T17:22:03.728771Z 0 [Note] WSREP: gcomm: connecting to group 'pxc-cluster', peer '172.31.5.172:,172.31.13.81:' 2017-11-05T17:22:03.728686Z 0 [Note] WSREP: EVS version 0 2017-11-05T17:22:03.728771Z 0 [Note] WSREP: gcomm: connecting to group 'pxc-cluster', peer '172.31.5.172:,172.31.13.81:' 2017-11-05T17:22:03.729595Z 0 [Note] WSREP: (d73a400b, 'tcp://0.0.0.0:4567') connection established to d73a400b tcp://172.31.5.172:4567 2017-11-05T17:22:03.729609Z 0 [Warning] WSREP: (d73a400b, 'tcp://0.0.0.0:4567') address 'tcp://172.31.5.172:4567' points to own listening address, blacklisting 2017-11-05T17:22:06.729830Z 0 [Note] WSREP: (d73a400b, 'tcp://0.0.0.0:4567') connection to peer 00000000 with addr tcp://172.31.13.81:4567 timed out, no messages seen in PT3S 2017-11-05T17:22:06.729954Z 0 [Note] WSREP: (d73a400b, 'tcp://0.0.0.0:4567') connection to peer d73a400b with addr tcp://172.31.5.172:4567 timed out, no messages seen in PT3S 2017-11-05T17:22:06.730072Z 0 [Warning] WSREP: no nodes coming from prim view, prim not possible 2017-11-05T17:22:06.730101Z 0 [Note] WSREP: Current view of cluster as seen by this node view (view_id(NON_PRIM,d73a400b,1) memb { d73a400b,0 } joined { } left { } partitioned { } ) 2017-11-05T17:22:07.230228Z 0 [Warning] WSREP: last inactive check more than PT1.5S ago (PT3.50154S), skipping check 2017-11-05T17:22:10.730540Z 0 [Note] WSREP: (d73a400b, 'tcp://0.0.0.0:4567') connection to peer 00000000 with addr tcp://172.31.13.81:4567 timed out, no messages seen in PT3S 2017-11-05T17:22:14.731257Z 0 [Note] WSREP: (d73a400b, 'tcp://0.0.0.0:4567') connection to peer 00000000 with addr tcp://172.31.13.81:4567 timed out, no messages seen in PT3S 2017-11-05T17:22:16.154486Z 0 [Note] WSREP: (d73a400b, 'tcp://0.0.0.0:4567') connection established to de905d70 tcp://172.31.13.81:4567 2017-11-05T17:22:16.154851Z 0 [Note] WSREP: (d73a400b, 'tcp://0.0.0.0:4567') turning message relay requesting on, nonlive peers: 2017-11-05T17:22:16.654815Z 0 [Note] WSREP: declaring de905d70 at tcp://172.31.13.81:4567 stable 2017-11-05T17:22:16.655411Z 0 [Warning] WSREP: no nodes coming from prim view, prim not possible 2017-11-05T17:22:16.655466Z 0 [Note] WSREP: Current view of cluster as seen by this node view (view_id(NON_PRIM,d73a400b,2) memb { d73a400b,0 de905d70,0 } joined { } left { } partitioned { } ) 2017-11-05T17:22:19.232193Z 0 [Note] WSREP: (d73a400b, 'tcp://0.0.0.0:4567') turning message relay requesting off
伺服器2:
2017-11-05T17:22:15.979415Z 0 [Warning] TIMESTAMP with implicit DEFAULT value is deprecated. Please use --explicit_defaults_for_timestamp server option (see documentation for more details). 2017-11-05T17:22:15.981265Z 0 [Note] mysqld (mysqld 5.7.19-17-57-log) starting as process 1325 ... 2017-11-05T17:22:15.983852Z 0 [Warning] No argument was provided to --log-bin, and --log-bin-index was not used; so replication may break when this MySQL server acts as a master and has his hostname changed!! Please use '--log-bin=ip-172-31-13-81-bin' to avoid this problem. 2017-11-05T17:22:15.984031Z 0 [Note] WSREP: Setting wsrep_ready to false 2017-11-05T17:22:15.984043Z 0 [Note] WSREP: No pre-stored wsrep-start position found. Skipping position initialization. 2017-11-05T17:22:15.984047Z 0 [Note] WSREP: wsrep_load(): loading provider library '/usr/lib/galera3/libgalera_smm.so' 2017-11-05T17:22:16.013201Z 0 [Note] WSREP: wsrep_load(): Galera 3.22(r8678538) by Codership Oy <info@codership.com> loaded successfully. 2017-11-05T17:22:16.013252Z 0 [Note] WSREP: CRC-32C: using hardware acceleration. 2017-11-05T17:22:16.015682Z 0 [Note] WSREP: Found saved state: 00000000-0000-0000-0000-000000000000:-1, safe_to_bootsrap: 0 2017-11-05T17:22:16.019709Z 0 [Note] WSREP: Passing config to GCS: base_dir = /var/lib/mysql/; base_host = 172.31.13.81; base_port = 4567; cert.log_conflicts = no; debug = no; evs.auto_evict = 0; evs.delay_margin = PT1S; evs.delayed_keep_period = PT30S; evs.inactive_check_period = PT0.5S; evs.inactive_timeout = PT15S; evs.join_retrans_period = PT1S; evs.max_install_timeouts = 3; evs.send_window = 10; evs.stats_report_period = PT1M; evs.suspect_timeout = PT5S; evs.user_send_window = 4; evs.view_forget_timeout = PT24H; gcache.dir = /var/lib/mysql/; gcache.keep_pages_count = 0; gcache.keep_pages_size = 0; gcache.mem_size = 0; gcache.name = /var/lib/mysql//galera.cache; gcache.page_size = 128M; gcache.recover = no; gcache.size = 128M; gcomm.thread_prio = ; gcs.fc_debug = 0; gcs.fc_factor = 1; gcs.fc_limit = 100; gcs.fc_master_slave = no; gcs.max_packet_size = 64500; gcs.max_throttle = 0.25; gcs.recv_q_hard_limit = 9223372036854775807; gcs.recv_q_soft_limit = 0.25; gcs.sync_donor = no; gmcast.segment = 0; gmcast.version = 0; pc.announce_timeout = PT3S; pc.checksum = false; pc.ignore_quorum = false; pc.ignore_sb = false; pc.npvo = false; pc.recovery = 1; pc.version = 0; pc.wait_prim = true; pc.wait_prim_timeout = PT30S; pc.weight = 1; protonet.backend = asio; protonet.version = 0; repl.causal_read_timeout = PT30S; repl.commit_order = 3; repl.key_format = FLAT8; repl.max_ws_size = 2147483647; repl.proto_max = 7; socket.checksum = 2; socket.recv_buf_size = 212992; 2017-11-05T17:22:16.034431Z 0 [Note] WSREP: GCache history reset: cc6a7fb6-c24b-11e7-bd13-ba1518e34f61:0 -> 00000000-0000-0000-0000-000000000000:-1 2017-11-05T17:22:16.035026Z 0 [Note] WSREP: Assign initial position for certification: -1, protocol version: -1 2017-11-05T17:22:16.035038Z 0 [Note] WSREP: Preparing to initiate SST/IST 2017-11-05T17:22:16.035041Z 0 [Note] WSREP: Starting replication 2017-11-05T17:22:16.035048Z 0 [Note] WSREP: Setting initial position to 00000000-0000-0000-0000-000000000000:-1 2017-11-05T17:22:16.035159Z 0 [Note] WSREP: Using CRC-32C for message checksums. 2017-11-05T17:22:16.035214Z 0 [Note] WSREP: gcomm thread scheduling priority set to other:0 2017-11-05T17:22:16.036228Z 0 [Warning] WSREP: Fail to access the file (/var/lib/mysql//gvwstate.dat) error (No such file or directory). It is possible if node is booting for first time or re-booting after a graceful shutdown 2017-11-05T17:22:16.036250Z 0 [Note] WSREP: Restoring primary-component from disk failed. Either node is booting for first time or re-booting after a graceful shutdown 2017-11-05T17:22:16.036685Z 0 [Note] WSREP: GMCast version 0 2017-11-05T17:22:16.036828Z 0 [Note] WSREP: (de905d70, 'tcp://0.0.0.0:4567') listening at tcp://0.0.0.0:4567 2017-11-05T17:22:16.036836Z 0 [Note] WSREP: (de905d70, 'tcp://0.0.0.0:4567') multicast: , ttl: 1 2017-11-05T17:22:16.037997Z 0 [Note] WSREP: EVS version 0 2017-11-05T17:22:16.038072Z 0 [Note] WSREP: gcomm: connecting to group 'pxc-cluster', peer '172.31.5.172:,172.31.13.81:' 2017-11-05T17:22:16.039496Z 0 [Note] WSREP: (de905d70, 'tcp://0.0.0.0:4567') connection established to de905d70 tcp://172.31.13.81:4567 2017-11-05T17:22:16.039513Z 0 [Warning] WSREP: (de905d70, 'tcp://0.0.0.0:4567') address 'tcp://172.31.13.81:4567' points to own listening address, blacklisting 2017-11-05T17:22:16.039960Z 0 [Note] WSREP: (de905d70, 'tcp://0.0.0.0:4567') connection established to d73a400b tcp://172.31.5.172:4567 2017-11-05T17:22:16.040014Z 0 [Note] WSREP: (de905d70, 'tcp://0.0.0.0:4567') turning message relay requesting on, nonlive peers: 2017-11-05T17:22:16.540329Z 0 [Note] WSREP: declaring d73a400b at tcp://172.31.5.172:4567 stable 2017-11-05T17:22:16.540836Z 0 [Warning] WSREP: no nodes coming from prim view, prim not possible 2017-11-05T17:22:16.540862Z 0 [Note] WSREP: Current view of cluster as seen by this node view (view_id(NON_PRIM,d73a400b,2) memb { d73a400b,0 de905d70,0 } joined { } left { } partitioned { } ) 2017-11-05T17:22:19.039641Z 0 [Note] WSREP: (de905d70, 'tcp://0.0.0.0:4567') connection to peer de905d70 with addr tcp://172.31.13.81:4567 timed out, no messages seen in PT3S 2017-11-05T17:22:19.539775Z 0 [Note] WSREP: (de905d70, 'tcp://0.0.0.0:4567') turning message relay requesting off 2017-11-05T17:22:37.061048Z 0 [Note] WSREP: (de905d70, 'tcp://0.0.0.0:4567') turning message relay requesting on, nonlive peers: tcp://172.31.5.172:4567 2017-11-05T17:22:38.061306Z 0 [Note] WSREP: (de905d70, 'tcp://0.0.0.0:4567') reconnecting to d73a400b (tcp://172.31.5.172:4567), attempt 0 2017-11-05T17:22:41.562259Z 0 [Note] WSREP: evs::proto(de905d70, OPERATIONAL, view_id(REG,d73a400b,2)) suspecting node: d73a400b 2017-11-05T17:22:41.562325Z 0 [Note] WSREP: evs::proto(de905d70, OPERATIONAL, view_id(REG,d73a400b,2)) suspected node without join message, declaring inactive 2017-11-05T17:22:42.562445Z 0 [Note] WSREP: Current view of cluster as seen by this node view (view_id(NON_PRIM,d73a400b,2) memb { de905d70,0 } joined { } left { } partitioned { d73a400b,0 } ) 2017-11-05T17:22:42.562525Z 0 [Warning] WSREP: no nodes coming from prim view, prim not possible 2017-11-05T17:22:42.562550Z 0 [Note] WSREP: Current view of cluster as seen by this node view (view_id(NON_PRIM,de905d70,3) memb { de905d70,0 } joined { } left { } partitioned { d73a400b,0 } ) 2017-11-05T17:22:47.071058Z 0 [Note] WSREP: Current view of cluster as seen by this node view ((empty)) 2017-11-05T17:22:47.071334Z 0 [ERROR] WSREP: failed to open gcomm backend connection: 110: failed to reach primary view: 110 (Connection timed out) at gcomm/src/pc.cpp:connect():158 2017-11-05T17:22:47.071356Z 0 [ERROR] WSREP: gcs/src/gcs_core.cpp:gcs_core_open():208: Failed to open backend connection: -110 (Connection timed out) 2017-11-05T17:22:47.071412Z 0 [ERROR] WSREP: gcs/src/gcs.cpp:gcs_open():1513: Failed to open channel 'pxc-cluster' at 'gcomm://172.31.5.172,172.31.13.81': -110 (Connection timed out) 2017-11-05T17:22:47.071421Z 0 [ERROR] WSREP: gcs connect failed: Connection timed out 2017-11-05T17:22:47.071426Z 0 [ERROR] WSREP: Provider/Node (gcomm://172.31.5.172,172.31.13.81) failed to establish connection with cluster (reason: 7) 2017-11-05T17:22:47.071430Z 0 [ERROR] Aborting 2017-11-05T17:22:47.071433Z 0 [Note] Giving 0 client threads a chance to die gracefully 2017-11-05T17:22:47.071437Z 0 [Note] WSREP: Waiting for active wsrep applier to exit 2017-11-05T17:22:47.071440Z 0 [Note] WSREP: Service disconnected. 2017-11-05T17:22:47.071443Z 0 [Note] WSREP: Waiting to close threads...... 2017-11-05T17:22:52.072202Z 0 [Note] WSREP: Some threads may fail to exit. 2017-11-05T17:22:52.072285Z 0 [Note] Binlog end 2017-11-05T17:22:52.072905Z 0 [Note] mysqld: Shutdown complete 2017-11-05T17:22:52.322273Z 0 [Warning] TIMESTAMP with implicit DEFAULT value is deprecated. Please use --explicit_defaults_for_timestamp server option (see documentation for more details). 2017-11-05T17:22:52.325253Z 0 [Warning] No argument was provided to --log-bin, and --log-bin-index was not used; so replication may break when this MySQL server acts as a master and has his hostname changed!! Please use '--log-bin=ip-172-31-13-81-bin' to avoid this problem. 2017-11-05T17:22:52.344397Z 0 [Warning] WSREP: Fail to access the file (/var/lib/mysql//gvwstate.dat) error (No such file or directory). It is possible if node is booting for first time or re-booting after a graceful shutdown 2017-11-05T17:22:52.346176Z 0 [Warning] WSREP: (f4348c81, 'tcp://0.0.0.0:4567') address 'tcp://172.31.13.81:4567' points to own listening address, blacklisting 2017-11-05T17:22:52.847728Z 0 [Warning] WSREP: no nodes coming from prim view, prim not possible 2017-11-05T17:23:22.856010Z 0 [ERROR] WSREP: failed to open gcomm backend connection: 110: failed to reach primary view: 110 (Connection timed out) at gcomm/src/pc.cpp:connect():158 2017-11-05T17:23:22.856051Z 0 [ERROR] WSREP: gcs/src/gcs_core.cpp:gcs_core_open():208: Failed to open backend connection: -110 (Connection timed out) 2017-11-05T17:23:22.856128Z 0 [ERROR] WSREP: gcs/src/gcs.cpp:gcs_open():1513: Failed to open channel 'pxc-cluster' at 'gcomm://172.31.5.172,172.31.13.81': -110 (Connection timed out) 2017-11-05T17:23:22.856140Z 0 [ERROR] WSREP: gcs connect failed: Connection timed out 2017-11-05T17:23:22.856147Z 0 [ERROR] WSREP: Provider/Node (gcomm://172.31.5.172,172.31.13.81) failed to establish connection with cluster (reason: 7) 2017-11-05T17:23:22.856151Z 0 [ERROR] Aborting
是的,瑞克是對的。因為所有節點都已關閉,您必須再次引導集群。
您優雅地停止了所有節點。
如果您知道什麼是最後一個(節點)停止,請選擇此節點太引導,然後啟動另一個節點。
但是如果你不記得你關閉的最後一個節點,從所有節點打開 grastate.dat(位於 mysql 目錄中)並比較裡面的 seqno 編號並選擇一個具有更大編號的節點並首先啟動該節點。如果您選擇了錯誤的引導節點,另一個節點必須刪除所有數據並再次向 SST 請求。所以要小心