* master host_2
dba:lc> show master status;
+---------------------+----------+--------------+------------------+------------------------------------------------------------------------------------------+
| File | Position | Binlog_Do_DB | Binlog_Ignore_DB | Executed_Gtid_Set |
+---------------------+----------+--------------+------------------+------------------------------------------------------------------------------------------+
| host_2.000008 | 5445 | | | 0923e916-3c36-11e6-82a5-ecf4bbf1f518:1-50,
ebd9ff93-c5b2-11e6-b21d-ecf4bbf1f42c:1-446392 |
+---------------------+----------+--------------+------------------+------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)
* slave host_1
Retrieved_Gtid_Set:
Executed_Gtid_Set: 0923e916-3c36-11e6-82a5-ecf4bbf1f518:1-50,
ebd9ff93-c5b2-11e6-b21d-ecf4bbf1f42c:1-446385
Auto_Position: 1
* etl host_3
Retrieved_Gtid_Set: 0923e916-3c36-11e6-82a5-ecf4bbf1f518:46-50,
ebd9ff93-c5b2-11e6-b21d-ecf4bbf1f42c:446386-446388
Executed_Gtid_Set: 0923e916-3c36-11e6-82a5-ecf4bbf1f518:1-50,
ebd9ff93-c5b2-11e6-b21d-ecf4bbf1f42c:1-446388
Auto_Position: 1
* 隔离master的网络,让其等同于down机
master> iptables -A INPUT -p tcp -s other_host --dport 22 -j ACCEPT
master> iptables -A INPUT -p tcp -s 0.0.0.0/0 -j DROP
masterha_master_switch --global_conf=/data/online/agent/MHA/conf/masterha_default.cnf --conf=/data/online/agent/MHA/conf/bak_mha_test.cnf --dead_master_host= host_2 --dead_master_port=3306 --master_state=dead --interactive=0 --ignore_last_failover --ignore_binlog_server_error
Fri Nov 10 11:12:38 2017 - [info] MHA::MasterFailover version 0.56.
Fri Nov 10 11:12:38 2017 - [info] Starting master failover.
Fri Nov 10 11:12:38 2017 - [info]
Fri Nov 10 11:12:38 2017 - [info] * Phase 1: Configuration Check Phase..
Fri Nov 10 11:12:38 2017 - [info]
Fri Nov 10 11:13:28 2017 - [warning] HealthCheck: Got timeout on checking SSH connection to host_2! at /usr/share/perl5/vendor_perl/MHA/HealthCheck.pm line 342.
Fri Nov 10 11:13:28 2017 - [warning] Failed to SSH to binlog server host_2
Fri Nov 10 11:13:29 2017 - [info] HealthCheck: SSH to host_1 is reachable.
Fri Nov 10 11:13:29 2017 - [info] Binlog server host_1 is reachable.
Fri Nov 10 11:13:29 2017 - [info] HealthCheck: SSH to host_3 is reachable.
Fri Nov 10 11:13:29 2017 - [info] Binlog server host_3 is reachable.
Fri Nov 10 11:13:29 2017 - [warning] SQL Thread is stopped(no error) on host_1( host_1:3306)
Fri Nov 10 11:13:29 2017 - [warning] SQL Thread is stopped(no error) on host_3( host_3:3306)
Fri Nov 10 11:13:29 2017 - [info] GTID failover mode = 1
Fri Nov 10 11:13:29 2017 - [info] Dead Servers:
Fri Nov 10 11:13:29 2017 - [info] host_2( host_2:3306)
Fri Nov 10 11:13:29 2017 - [info] Checking master reachability via MySQL(double check)...
Fri Nov 10 11:13:30 2017 - [info] ok.
Fri Nov 10 11:13:30 2017 - [info] Alive Servers:
Fri Nov 10 11:13:30 2017 - [info] host_1( host_1:3306)
Fri Nov 10 11:13:30 2017 - [info] host_3( host_3:3306)
Fri Nov 10 11:13:30 2017 - [info] Alive Slaves:
Fri Nov 10 11:13:30 2017 - [info] host_1( host_1:3306) Version=5.7.13-log (oldest major version between slaves) log-bin:enabled
Fri Nov 10 11:13:30 2017 - [info] GTID ON
Fri Nov 10 11:13:30 2017 - [info] Replicating from host_2( host_2:3306)
Fri Nov 10 11:13:30 2017 - [info] Primary candidate for the new Master (candidate_master is set)
Fri Nov 10 11:13:30 2017 - [info] host_3( host_3:3306) Version=5.7.13-log (oldest major version between slaves) log-bin:enabled
Fri Nov 10 11:13:30 2017 - [info] GTID ON
Fri Nov 10 11:13:30 2017 - [info] Replicating from host_2( host_2:3306)
Fri Nov 10 11:13:30 2017 - [info] Not candidate for the new Master (no_master is set)
Fri Nov 10 11:13:30 2017 - [info] Starting SQL thread on host_1( host_1:3306) ..
Fri Nov 10 11:13:30 2017 - [info] done.
Fri Nov 10 11:13:30 2017 - [info] Starting SQL thread on host_3( host_3:3306) ..
Fri Nov 10 11:13:30 2017 - [info] done.
Fri Nov 10 11:13:30 2017 - [info] Starting GTID based failover.
Fri Nov 10 11:13:30 2017 - [info]
Fri Nov 10 11:13:30 2017 - [info] ** Phase 1: Configuration Check Phase completed.
Fri Nov 10 11:13:30 2017 - [info]
Fri Nov 10 11:13:30 2017 - [info] * Phase 2: Dead Master Shutdown Phase..
Fri Nov 10 11:13:30 2017 - [info]
Fri Nov 10 11:14:20 2017 - [warning] HealthCheck: Got timeout on checking SSH connection to host_2! at /usr/share/perl5/vendor_perl/MHA/HealthCheck.pm line 342.
Fri Nov 10 11:14:20 2017 - [info] Forcing shutdown so that applications never connect to the current master..
Fri Nov 10 11:14:20 2017 - [info] Executing master IP deactivation script:
Fri Nov 10 11:14:20 2017 - [info] /data/online/agent/MHA/masterha/bak_mha_test/master_ip_failover_mha_test --orig_master_host= host_2 --orig_master_ip= host_2 --orig_master_port=3306 --command=stop
ssh: connect to host host_2 port 22: Connection timed out
=================== swift vip : tgw_vip from host_2 is deleted ==============================
--2017-11-10 11:14:27-- http://tgw_server/cgi-bin/fun_logic/bin/public_api/op_rs.cgi
正在连接 tgw_server:80... 已连接。
已发出 HTTP 请求,正在等待回应... 200 OK
长度:未指定 [text/html]
正在保存至: “STDOUT”
0K 11.4M=0s
2017-11-10 11:16:27 (11.4 MB/s) - 已写入标准输出 [38]
Fri Nov 10 11:16:27 2017 - [info] done.
Fri Nov 10 11:16:27 2017 - [warning] shutdown_script is not set. Skipping explicit shutting down of the dead master.
Fri Nov 10 11:16:27 2017 - [info] * Phase 2: Dead Master Shutdown Phase completed.
Fri Nov 10 11:16:27 2017 - [info]
Fri Nov 10 11:16:27 2017 - [info] * Phase 3: Master Recovery Phase..
Fri Nov 10 11:16:27 2017 - [info]
Fri Nov 10 11:16:27 2017 - [info] * Phase 3.1: Getting Latest Slaves Phase..
Fri Nov 10 11:16:27 2017 - [info]
Fri Nov 10 11:16:27 2017 - [info] The latest binary log file/position on all slaves is host_2.000008:4265
Fri Nov 10 11:16:27 2017 - [info] Retrieved Gtid Set: 0923e916-3c36-11e6-82a5-ecf4bbf1f518:46-50,
ebd9ff93-c5b2-11e6-b21d-ecf4bbf1f42c:446386-446388
Fri Nov 10 11:16:27 2017 - [info] Latest slaves (Slaves that received relay log files to the latest):
Fri Nov 10 11:16:27 2017 - [info] host_3( host_3:3306) Version=5.7.13-log (oldest major version between slaves) log-bin:enabled
Fri Nov 10 11:16:27 2017 - [info] GTID ON
Fri Nov 10 11:16:27 2017 - [info] Replicating from host_2( host_2:3306)
Fri Nov 10 11:16:27 2017 - [info] Not candidate for the new Master (no_master is set)
Fri Nov 10 11:16:27 2017 - [info] The oldest binary log file/position on all slaves is host_2.000008:3380
Fri Nov 10 11:16:27 2017 - [info] Oldest slaves:
Fri Nov 10 11:16:27 2017 - [info] host_1( host_1:3306) Version=5.7.13-log (oldest major version between slaves) log-bin:enabled
Fri Nov 10 11:16:27 2017 - [info] GTID ON
Fri Nov 10 11:16:27 2017 - [info] Replicating from host_2( host_2:3306)
Fri Nov 10 11:16:27 2017 - [info] Primary candidate for the new Master (candidate_master is set)
Fri Nov 10 11:16:27 2017 - [info]
Fri Nov 10 11:16:27 2017 - [info] * Phase 3.3: Determining New Master Phase..
Fri Nov 10 11:16:27 2017 - [info]
Fri Nov 10 11:16:27 2017 - [info] Searching new master from slaves..
Fri Nov 10 11:16:27 2017 - [info] Candidate masters from the configuration file:
Fri Nov 10 11:16:27 2017 - [info] host_1( host_1:3306) Version=5.7.13-log (oldest major version between slaves) log-bin:enabled
Fri Nov 10 11:16:27 2017 - [info] GTID ON
Fri Nov 10 11:16:27 2017 - [info] Replicating from host_2( host_2:3306)
Fri Nov 10 11:16:27 2017 - [info] Primary candidate for the new Master (candidate_master is set)
Fri Nov 10 11:16:27 2017 - [info] Non-candidate masters:
Fri Nov 10 11:16:27 2017 - [info] host_3( host_3:3306) Version=5.7.13-log (oldest major version between slaves) log-bin:enabled
Fri Nov 10 11:16:27 2017 - [info] GTID ON
Fri Nov 10 11:16:27 2017 - [info] Replicating from host_2( host_2:3306)
Fri Nov 10 11:16:27 2017 - [info] Not candidate for the new Master (no_master is set)
Fri Nov 10 11:16:27 2017 - [info] Searching from candidate_master slaves which have received the latest relay log events..
Fri Nov 10 11:16:27 2017 - [info] Not found.
Fri Nov 10 11:16:27 2017 - [info] Searching from all candidate_master slaves..
Fri Nov 10 11:16:27 2017 - [info] New master is host_1( host_1:3306)
Fri Nov 10 11:16:27 2017 - [info] Starting master failover..
Fri Nov 10 11:16:27 2017 - [info]
From:
host_2( host_2:3306) (current master)
+-- host_1( host_1:3306)
+-- host_3( host_3:3306)
To:
host_1( host_1:3306) (new master)
+-- host_3( host_3:3306)
Fri Nov 10 11:16:27 2017 - [info]
Fri Nov 10 11:16:27 2017 - [info] * Phase 3.3: New Master Recovery Phase..
Fri Nov 10 11:16:27 2017 - [info]
Fri Nov 10 11:16:27 2017 - [info] Waiting all logs to be applied..
Fri Nov 10 11:16:27 2017 - [info] done.
Fri Nov 10 11:16:27 2017 - [info] Replicating from the latest slave host_3( host_3:3306) and waiting to apply..
Fri Nov 10 11:16:27 2017 - [info] Waiting all logs to be applied on the latest slave..
Fri Nov 10 11:16:27 2017 - [info] Resetting slave host_1( host_1:3306) and starting replication from the new master host_3( host_3:3306)..
Fri Nov 10 11:16:27 2017 - [info] Executed CHANGE MASTER.
Fri Nov 10 11:16:28 2017 - [info] Slave started.
Fri Nov 10 11:16:28 2017 - [info] Waiting to execute all relay logs on host_1( host_1:3306)..
Fri Nov 10 11:16:28 2017 - [info] master_pos_wait( host_3.000049:40136) completed on host_1( host_1:3306). Executed 0 events.
Fri Nov 10 11:16:28 2017 - [info] done.
Fri Nov 10 11:16:28 2017 - [info] done.
Fri Nov 10 11:16:28 2017 - [info] -- Saving binlog from host host_2 started, pid: 43038
Fri Nov 10 11:16:28 2017 - [info] -- Saving binlog from host host_1 started, pid: 43039
Fri Nov 10 11:16:28 2017 - [info] -- Saving binlog from host host_3 started, pid: 43041
Fri Nov 10 11:16:28 2017 - [info]
Fri Nov 10 11:16:28 2017 - [info] Log messages from host_2 ...
Fri Nov 10 11:16:28 2017 - [info] End of log messages from host_2.
Fri Nov 10 11:16:28 2017 - [warning] SSH is not reachable on host_2. Skipping
Fri Nov 10 11:16:28 2017 - [info]
Fri Nov 10 11:16:28 2017 - [info] Log messages from host_1 ...
Fri Nov 10 11:16:28 2017 - [info]
Fri Nov 10 11:16:28 2017 - [info] Fetching binary logs from binlog server host_1..
Fri Nov 10 11:16:28 2017 - [info] Executing binlog save command: save_binary_logs --command=save --start_file= host_2.000008 --start_pos=4265 --output_file=/var/log/masterha/mha_test/saved_binlog_binlog2_20171110111238.binlog --handle_raw_binlog=0 --skip_filter=1 --disable_log_bin=0 --manager_version=0.56 --oldest_version=5.7.13-log --binlog_dir=/data/mysql.bin
Failed to save binary log: Binlog not found from /data/mysql.bin! If you got this error at MHA Manager, please set "master_binlog_dir=/path/to/binlog_directory_of_the_master" correctly in the MHA Manager's configuration file and try again.
at /usr/bin/save_binary_logs line 123
eval {...} called at /usr/bin/save_binary_logs line 70
main::main() called at /usr/bin/save_binary_logs line 66
Fri Nov 10 11:16:28 2017 - [error][/usr/share/perl5/vendor_perl/MHA/MasterFailover.pm, ln660] Failed to save binary log events from the binlog server. Maybe disks on binary logs are not accessible or binary log itself is corrupt?
Fri Nov 10 11:16:28 2017 - [info] End of log messages from host_1.
Fri Nov 10 11:16:28 2017 - [warning] Got error from host_1.
Fri Nov 10 11:16:28 2017 - [info]
Fri Nov 10 11:16:28 2017 - [info] Log messages from host_3 ...
Fri Nov 10 11:16:28 2017 - [info]
Fri Nov 10 11:16:28 2017 - [info] Fetching binary logs from binlog server host_3..
Fri Nov 10 11:16:28 2017 - [info] Executing binlog save command: save_binary_logs --command=save --start_file= host_2.000008 --start_pos=4265 --output_file=/var/log/masterha/mha_test/saved_binlog_binlog3_20171110111238.binlog --handle_raw_binlog=0 --skip_filter=1 --disable_log_bin=0 --manager_version=0.56 --oldest_version=5.7.13-log --binlog_dir=/data/mysql.bin
Failed to save binary log: Binlog not found from /data/mysql.bin! If you got this error at MHA Manager, please set "master_binlog_dir=/path/to/binlog_directory_of_the_master" correctly in the MHA Manager's configuration file and try again.
at /usr/bin/save_binary_logs line 123
eval {...} called at /usr/bin/save_binary_logs line 70
main::main() called at /usr/bin/save_binary_logs line 66
Fri Nov 10 11:16:28 2017 - [error][/usr/share/perl5/vendor_perl/MHA/MasterFailover.pm, ln660] Failed to save binary log events from the binlog server. Maybe disks on binary logs are not accessible or binary log itself is corrupt?
Fri Nov 10 11:16:28 2017 - [info] End of log messages from host_3.
Fri Nov 10 11:16:28 2017 - [warning] Got error from host_3.
Fri Nov 10 11:16:28 2017 - [info] Getting new master's binlog name and position..
Fri Nov 10 11:16:28 2017 - [info] host_1.000058:4059
Fri Nov 10 11:16:28 2017 - [info] All other slaves should start replication from here. Statement should be: CHANGE MASTER TO MASTER_HOST=' host_1', MASTER_PORT=3306, MASTER_AUTO_POSITION=1, MASTER_USER='repl', MASTER_PASSWORD='xxx';
Fri Nov 10 11:16:28 2017 - [info] Master Recovery succeeded. File:Pos:Exec_Gtid_Set: host_1.000058, 4059, 0923e916-3c36-11e6-82a5-ecf4bbf1f518:1-50,
ebd9ff93-c5b2-11e6-b21d-ecf4bbf1f42c:1-446388
Fri Nov 10 11:16:28 2017 - [info] Executing master IP activate script:
Fri Nov 10 11:16:28 2017 - [info] /data/online/agent/MHA/masterha/bak_mha_test/master_ip_failover_mha_test --command=start --ssh_user=root --orig_master_host= host_2 --orig_master_ip= host_2 --orig_master_port=3306 --new_master_host= host_1 --new_master_ip= host_1 --new_master_port=3306 --new_master_user='dba' --new_master_password='dba'
Unknown option: new_master_user
Unknown option: new_master_password
=================== swift vip : tgw_vip to host_1 is added ==============================
Fri Nov 10 11:16:30 2017 - [info] OK.
Fri Nov 10 11:16:30 2017 - [info] ** Finished master recovery successfully.
Fri Nov 10 11:16:30 2017 - [info] * Phase 3: Master Recovery Phase completed.
Fri Nov 10 11:16:30 2017 - [info]
Fri Nov 10 11:16:30 2017 - [info] * Phase 4: Slaves Recovery Phase..
Fri Nov 10 11:16:30 2017 - [info]
Fri Nov 10 11:16:30 2017 - [info]
Fri Nov 10 11:16:30 2017 - [info] * Phase 4.1: Starting Slaves in parallel..
Fri Nov 10 11:16:30 2017 - [info]
Fri Nov 10 11:16:30 2017 - [info] -- Slave recovery on host host_3( host_3:3306) started, pid: 46878. Check tmp log /var/log/masterha/mha_test/ host_3_3306_20171110111238.log if it takes time..
Fri Nov 10 11:16:31 2017 - [info]
Fri Nov 10 11:16:31 2017 - [info] Log messages from host_3 ...
Fri Nov 10 11:16:31 2017 - [info]
Fri Nov 10 11:16:30 2017 - [info] Resetting slave host_3( host_3:3306) and starting replication from the new master host_1( host_1:3306)..
Fri Nov 10 11:16:30 2017 - [info] Executed CHANGE MASTER.
Fri Nov 10 11:16:31 2017 - [info] Slave started.
Fri Nov 10 11:16:31 2017 - [info] gtid_wait(0923e916-3c36-11e6-82a5-ecf4bbf1f518:1-50,
ebd9ff93-c5b2-11e6-b21d-ecf4bbf1f42c:1-446388) completed on host_3( host_3:3306). Executed 0 events.
Fri Nov 10 11:16:31 2017 - [info] End of log messages from host_3.
Fri Nov 10 11:16:31 2017 - [info] -- Slave on host host_3( host_3:3306) started.
Fri Nov 10 11:16:31 2017 - [info] All new slave servers recovered successfully.
Fri Nov 10 11:16:31 2017 - [info]
Fri Nov 10 11:16:31 2017 - [info] * Phase 5: New master cleanup phase..
Fri Nov 10 11:16:31 2017 - [info]
Fri Nov 10 11:16:31 2017 - [info] Resetting slave info on the new master..
Fri Nov 10 11:16:31 2017 - [info] host_1: Resetting slave info succeeded.
Fri Nov 10 11:16:31 2017 - [info] Master failover to host_1( host_1:3306) completed successfully.
Fri Nov 10 11:16:31 2017 - [info]
----- Failover Report -----
bak_mha_test: MySQL Master failover host_2( host_2:3306) to host_1( host_1:3306) succeeded
Master host_2( host_2:3306) is down!
Check MHA Manager logs at tjtx135-2-217.58os.org:/var/log/masterha/mha_test/mha_test.log for details.
Started automated(non-interactive) failover.
Invalidated master IP address on host_2( host_2:3306)
Selected host_1( host_1:3306) as a new master.
host_1( host_1:3306): OK: Applying all logs succeeded.
host_1( host_1:3306): OK: Activated master IP address.
host_3( host_3:3306): OK: Slave started, replicating from host_1( host_1:3306)
host_1( host_1:3306): Resetting slave info succeeded.
Master failover to host_1( host_1:3306) completed successfully.
Fri Nov 10 11:16:31 2017 - [info] Sending mail..
如果dead master之后又活过来了,那么这一步要做
dead_master> /usr/local/realserver/RS_TUNL0/etc/setup_rs.sh -c
http://gitlab.corp.anjuke.com/_dba/architecture/blob/master/personal/Keithlan/other/share/tools/always_used_command.md ==》 tgw章节详细描述