测试MHA 的高可用
在MySQL MHA原理(一)中,我们已经可以完整的搭建配置MHA高可用集群了,本章节来主要讲述对高可用功能的验证。 所谓高可用,就是能满足多种异常情况下的热服务,所以这里测试需要很详尽的测试用例。
这里我首先给出测试Case及结果,有兴趣的同学可以翻看后面的具体过程:
在MHA自动切换的情况下:
- Case1:Master 意外宕掉,Slave正常,备选Slave正常。
- Case2 : Master 意外宕掉,Slave落后很多binlog,备选Slave正常。
- Case3:Master 意外宕掉,Slave正常,备选Slave落后很多binlog。
- Case4:Master 意外宕掉,Slave复制意外停止,备选Slave正常。
- Case5: Master 意外宕掉,Slave正常,备选Slave复制意外停止。
- Case6:Master 意外宕掉,Slave正常,备选Slave也意外宕掉。
- Case7:Master too many connection,Slave正常,备选Slave正常。
- Case8:Master 正常工作,Slave (或备选Slave)意外宕掉。
- Case9:Master 网路瞬断(1~30秒)。
- Case10:有二次检测机制下,如果Manager通过一台Slave无法连接(另一台正常)。
在MHA手动触发的情况下:
- Case1:Master 意外宕掉,Slave正常,备选Slave正常。
- Case2 : Master 意外宕掉,Slave落后很多binlog,备选Slave正常。
- Case3:Master 意外宕掉,Slave正常,备选Slave落后很多binlog。
- Case4:Master 意外宕掉,Slave复制意外停止,备选Slave正常。
- Case5: Master 意外宕掉,Slave正常,备选Slave复制意外停止。
- Case6:Master 意外宕掉,Slave正常,备选Slave也意外宕掉。
- Case7:Master too many connection,Slave正常,备选Slave正常。
- Case8:Master 正常工作,Slave (或备选Slave)意外宕掉。
- Case9:Master 有大的事务或写操作在进行,Slave正常,备选Slave也正常
- Case10:有二次检测机制下,如果Manager通过一台Slave无法连接(另一台正常)。
如上,这些Case是我们线上的数据库可能会遇到的意外情况,不要觉得多,因为数据库宕机本身就发生在一些类似于上述的特定场景。接下来我们将一一尝试,进行上述case的验证。
在MHA自动切换的情况下
Case1:Master 意外宕掉,Slave正常,备选Slave正常
- Step1:模拟Master意外宕机
1 2 3 |
# 在Master执行 [root@master]# killall -9 mysqld mysqld_safe |
- 结果:MHA成功切换,Slave被执行新的Master并成功开始复制
下面是manager节点上的mha切换日志,我们可以清晰的看到它的工作流程:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 |
·····省略一部分······ Phase 1: Configuration Check Phase.. //第一阶段,配置检测 ········· Phase 2: Dead Master Shutdown Phase.. //第二阶段,主动关闭有问题的master ········· Phase 3: Master Recovery Phase.. //第三阶段,Master切换阶段 ··········· Phase 3.1: Getting Latest Slaves Phase.. // 包含在第三阶段之内,3.1阶段,获得各个slave的状态 ········ Phase 3.2: Saving Dead Master's Binlog Phase.. // 保存已死master的binlog日志 ········ Phase 3.3: Determining New Master Phase.. // 抉择出新的Master ········· Phase 3.4: Master Log Apply Phase.. // 应用以故master的binlog阶段 ······· Phase 4: Slaves Recovery Phase.. // 第四阶段,从库的恢复 ······· Phase 4.1: Starting Parallel Slave Diff Log Generation Phase.. // 启动从库sql线程去执行日志 ······· Phase 4.2: Starting Parallel Slave Log Apply Phase.. // 应用差异日志 ,并将从库切换到新的master ······· Phase 5: New master cleanup phase.. //第五阶段,设置新master的一些东西,例如打开read_only,等等 ----- Failover Report ----- |
如上是我省略了一些日志的内容的结果,我们发现MHA的工作内容被清晰的记录在了日志中,出了问题之后我们可以根据日志内容定位和查看。
Case2 : Master 意外宕掉,Slave落后很多binlog,备选Slave正常
- Step1:模拟Slave延迟,在Slave执行如下语句
1 |
mysql> stop slave io_thread; |
- Step2:在Mster上生成大量请求
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 |
sysbench --test=oltp --oltp-table-size=1000000 --oltp-read-only=off --mysql-db=sbtest --oltp-table-name=sbtest2 --init-rng=on --num-threads=100 --max-requests=0 --max-time=100 --mysql-user=mha_user --mysql-socket=/tmp/mysql3306.sock --mysql-password="xxxxxx" --db-driver=mysql --oltp-test-mode=complex run sysbench 0.4.12: multi-threaded system evaluation benchmark Running the test with following options: Number of threads: 100 Initializing random number generator from timer. Doing OLTP test. Running mixed OLTP test Using Special distribution (12 iterations, 1 pct of values are returned in 75 pct cases) Using "BEGIN" for starting transactions Using auto_inc on the id column Threads started! Time limit exceeded, exiting... (last message repeated 99 times) Done. OLTP test statistics: queries performed: read: 15088850 write: 5383321 other: 2153711 total: 22625882 transactions: 1075936 (10758.62 per sec.) deadlocks: 1839 (18.39 per sec.) read/write requests: 20472171 (204707.67 per sec.) other operations: 2153711 (21535.63 per sec.) Test execution summary: total time: 100.0069s total number of events: 1075936 total time taken by event execution: 9993.0726 per-request statistics: min: 2.24ms avg: 9.29ms max: 136.91ms approx. 95 percentile: 14.55ms Threads fairness: events (avg/stddev): 10759.3600/33.57 execution time (avg/stddev): 99.9307/0.00 |
- Step3:模拟主库意外宕屌,并立刻开启Slave的io线程,模拟Slave延迟
1 2 3 4 5 6 7 8 |
# 在Master 上执行 [root@master]# killall -9 mysqld mysqld_safe # 在Slave上执行 mysql> start slave io_thread; |
- 结果:MHA成功切换,但耗时很长,大概耗时17分钟。
Case3:Master 意外宕掉,Slave正常,备选Slave落后很多binlog
- Step:测试步骤与Case2类似,这里忽略不再详述。
- 结果:MHA成功切换,但耗时很长。
Case4:Master 意外宕掉,Slave复制意外停止,备选Slave正常
- Step1:模拟Slave复制意外停止
1 2 3 |
# 在Slave执行如下SQL,直接关停主从复制 mysql> stop slave; |
- Step2:模拟主库意外宕机
1 2 3 |
# 在Master执行 [root@master]# killall -9 mysqld mysqld_safe |
- 结果:MHA成功切换,Slave被执行新的Master并成功开始复制
Case5: Master 意外宕掉,Slave正常,备选Slave复制意外停止
- Step:测试步骤与Case4类似,这里忽略不再详述。
- 结果:MHA成功切换,备选Slave成功上位,Slave指向新的Master。
Case6:Master 意外宕掉,Slave正常,备选Slave也意外宕掉
- Step1: 模拟备选Slave意外宕掉
1 2 3 |
# 在备选Slave执行 [root@slave2]# killall -9 mysqld mysqld_safe |
- Step2:模拟Master意外宕掉
1 2 3 |
# 在Master执行 [root@master]# killall -9 mysqld mysqld_safe |
- 结果:MHA切换失败,因为只剩一台Slave,也失去了切换的意义。
Case7:Master too many connection,Slave正常,备选Slave正常
- Step1:模拟Master 链接数过多的情况
1 2 3 4 5 6 7 8 9 |
[root@ master]# mysqlslap --concurrency=4000 --iterations=100 --query='insert into tc(name) select ("nxjisbxxs");' -hlocalhost -uroot -p"xxxxxxx" --socket=/tmp/mysql3306.sock --create-schema=sbtest mysqlslap: [Warning] Using a password on the command line interface can be insecure. mysqlslap: Error when connecting to server: 1040 Too many connections mysqlslap: Error when connecting to server: 1040 Too many connections mysqlslap: Error when connecting to server: 1040 Too many connections mysqlslap: Error when connecting to server: 1040 Too many connections mysqlslap: Error when connecting to server: 1040 Too many connections mysqlslap: Error when connecting to server: 1040 Too many connections mysqlslap: Error when connecting to server: 1040 Too many connections |
- 结果:MHA不会自动切换,经过确认,MHA的Manager用于检测主库是否宕机的数据库链接是长链接,在MySQL默认的配置文件下,长链接只有空闲8小时才会断开,所以Master too many connection 会导致新的链接无法建立,但是已经存在的长连接并没有受影响,所以Manager仍然可以正常检测到主库没有宕机,不会触发切换。
Case8:Master 正常工作,Slave (或备选Slave)意外宕掉
- Step:测试比较简单,直接干掉Slave观察即可,这里不再详述。
- 结果:MHA不会主动切换,Master不宕掉的话,他不会管Slave或者备选Slave的。
Case9:Master 网路瞬断(1~30秒)
- Step1: 模拟网路瞬断
1 2 3 4 5 6 |
# 在Master上将Manager和所有Slave的IP地址给禁掉 [root@master ]# iptables -I INPUT -s 10.67.250.2 -j DROP [root@master ]# iptables -I INPUT -s 10.67.250.3 -j DROP 这样对于Manager和Slave来说Master的网路就断开了。 |
- Step2:等待你要测试的瞬断时长,然后再恢复Master网路
1 |
[root@master ]# /etc/init.d/iptables reload |
- 结果:大概在(5~8)秒左右不会引起MHA的切换,超过这个值就会了,与你设置的 ping_interval 参数有关。
Case10:有二次检测机制下,如果Manager通过一台Slave无法连接(另一台正常)
- Step1: 模拟Manager无法正常连接Master
1 2 3 |
# 在Master上将Manager的IP地址给禁掉 [root@master ]# iptables -I INPUT -s 10.67.250.2 -j DROP |
- Step2:观察Manager状态和日志
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 |
# 产生如下日志 Fri May 26 21:24:03 2017 - [warning] Got timeout on MySQL Ping(SELECT) child process and killed it! at /usr/share/perl5/vendor_perl/MHA/HealthCheck.pm line 431. Fri May 26 21:24:03 2017 - [info] Executing secondary network check script: /usr/bin/masterha_secondary_check -s 10.67.250.2 -s 10.67.250.3 --user=mha_controller --port=4344 --master_host=10.67.250.1 --master_ip=10.67.250.1 --master_port=3306 --master_user=root --master_password="xxxxxx" --ping_type=SELECT --user=mha_controller --master_host=10.67.250.1 --master_ip=10.67.250.1 --master_port=3306 --master_user=root --master_password=xxxxxx --ping_type=SELECT Fri May 26 21:24:03 2017 - [info] Executing SSH check script: save_binary_logs --command=test --start_pos=4 --binlog_dir=/home/data/mysql3306 --output_file=/home/mha_controller/workdir//save_binary_logs_test --manager_version=0.56 --binlog_prefix=mysqld-bin Fri May 26 21:24:06 2017 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '10.67.250.1' (4)) Fri May 26 21:24:06 2017 - [warning] Connection failed 2 time(s).. Fri May 26 21:24:08 2017 - [warning] HealthCheck: Got timeout on checking SSH connection to 10.67.250.1! at /usr/share/perl5/vendor_perl/MHA/HealthCheck.pm line 342. Monitoring server 10.67.250.2 is reachable, Master is not reachable from 10.67.250.2. OK. Master is reachable from 10.67.250.3! Fri May 26 21:24:08 2017 - [warning] Master is reachable from at least one of other monitoring servers. Failover should not happen. Fri May 26 21:24:09 2017 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '10.67.250.1' (4)) Fri May 26 21:24:09 2017 - [warning] Connection failed 3 time(s).. Fri May 26 21:24:12 2017 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '10.67.250.1' (4)) Fri May 26 21:24:12 2017 - [warning] Connection failed 4 time(s).. Fri May 26 21:24:12 2017 - [warning] Secondary network check script returned errors. Failover should not start so checking server status again. Check network settings for details. Fri May 26 21:24:15 2017 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '10.67.250.1' (4)) Fri May 26 21:24:15 2017 - [warning] Connection failed 1 time(s).. Fri May 26 21:24:15 2017 - [info] Executing secondary network check script: /usr/bin/masterha_secondary_check -s 10.67.250.2 -s 10.67.250.3 --user=mha_controller --port=4344 --master_host=10.67.250.1 --master_ip=10.67.250.1 --master_port=3306 --master_user=root --master_password="xxxxxx" --ping_type=SELECT --user=mha_controller --master_host=10.67.250.1 --master_ip=10.67.250.1 --master_port=3306 --master_user=root --master_password=xxxxxx --ping_type=SELECT Fri May 26 21:24:15 2017 - [info] Executing SSH check script: save_binary_logs --command=test --start_pos=4 --binlog_dir=/home/data/mysql3306 --output_file=/home/mha_controller/workdir//save_binary_logs_test --manager_version=0.56 --binlog_prefix=mysqld-bin Fri May 26 21:24:18 2017 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '10.67.250.1' (4)) Fri May 26 21:24:18 2017 - [warning] Connection failed 2 time(s).. Fri May 26 21:24:20 2017 - [warning] HealthCheck: Got timeout on checking SSH connection to 10.67.250.1! at /usr/share/perl5/vendor_perl/MHA/HealthCheck.pm line 342. Monitoring server 10.67.250.2 is reachable, Master is not reachable from 10.67.250.2. OK. Master is reachable from 10.67.250.3! Fri May 26 21:24:20 2017 - [warning] Master is reachable from at least one of other monitoring servers. Failover should not happen. Fri May 26 21:24:21 2017 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '10.67.250.1' (4)) Fri May 26 21:24:21 2017 - [warning] Connection failed 3 time(s).. Fri May 26 21:24:24 2017 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '10.67.250.1' (4)) Fri May 26 21:24:24 2017 - [warning] Connection failed 4 time(s).. Fri May 26 21:24:24 2017 - [warning] Secondary network check script returned errors. <span style="color: #ff0000;"><strong>Failover should not start so checking server status again. Check network settings for details.</strong></span> |
- 结果:MHA可以通过另外一个Slave检测到Master是OK的,所以不会触发切换
在MHA手动触发的情况下
Case1:Master 意外宕掉,Slave正常,备选Slave正常
- Step1: 模拟Master意外宕掉
1 2 3 |
# 在Master执行 [root@master]# killall -9 mysqld mysqld_safe |
- Step2:在Manager主动触发切换
1 |
[mha_user@slave1 ~]$ masterha_master_switch --master_state=dead --dead_master_host=10.67.250.1 --ignore_last_failover --global_conf=/etc/mha.cnf --conf=/etc/mha.cnf |
- 结果:成功触发MHA切换
Case2 : Master 意外宕掉,Slave落后很多binlog,备选Slave正常
- Step1:模拟Slave延迟,在Slave执行如下语句
1 |
mysql> stop slave io_thread; |
- Step2:在Mster上生成大量请求
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 |
sysbench --test=oltp --oltp-table-size=1000000 --oltp-read-only=off --mysql-db=sbtest --oltp-table-name=sbtest2 --init-rng=on --num-threads=100 --max-requests=0 --max-time=100 --mysql-user=mha_user --mysql-socket=/tmp/mysql3306.sock --mysql-password="xxxxxx" --db-driver=mysql --oltp-test-mode=complex run sysbench 0.4.12: multi-threaded system evaluation benchmark Running the test with following options: Number of threads: 100 Initializing random number generator from timer. Doing OLTP test. Running mixed OLTP test Using Special distribution (12 iterations, 1 pct of values are returned in 75 pct cases) Using "BEGIN" for starting transactions Using auto_inc on the id column Threads started! Time limit exceeded, exiting... (last message repeated 99 times) Done. OLTP test statistics: queries performed: read: 15088850 write: 5383321 other: 2153711 total: 22625882 transactions: 1075936 (10758.62 per sec.) deadlocks: 1839 (18.39 per sec.) read/write requests: 20472171 (204707.67 per sec.) other operations: 2153711 (21535.63 per sec.) Test execution summary: total time: 100.0069s total number of events: 1075936 total time taken by event execution: 9993.0726 per-request statistics: min: 2.24ms avg: 9.29ms max: 136.91ms approx. 95 percentile: 14.55ms Threads fairness: events (avg/stddev): 10759.3600/33.57 execution time (avg/stddev): 99.9307/0.00 |
- Step3:模拟主库意外宕掉,并立刻开启Slave的io线程,模拟Slave延迟
1 2 3 4 5 6 7 8 |
# 在Master 上执行 [root@master]# killall -9 mysqld mysqld_safe # 在Slave上执行 mysql> start slave io_thread; |
- Step4:手动触发MHA切换
1 |
masterha_master_switch --master_state=dead --dead_master_host=10.3.250.88 --ignore_last_failover --global_conf=/etc/mha.cnf --conf=/etc/mha.cnf |
结果:MHA成功切换,但耗时很长,大概耗时17分钟。
注意:这里你没有再配置文件的每个Server选项中指定如下参数,那么很有可能,会切换到Slave上,而不是备选Slave,这个参数会忽略各个Slave的主从延迟,而固执的切换到你指定的备选Slave上。
check_repl_delay = 0
Case3:Master 意外宕掉,Slave正常,备选Slave落后很多binlog
- Step:测试步骤与Case2类似,这里忽略不再详述。
- 结果:MHA成功切换,但耗时很长。
Case4:Master 意外宕掉,Slave复制意外停止,备选Slave正常
- Step1:模拟Slave复制意外停止
1 2 3 |
# 在Slave执行如下SQL,直接关停主从复制 mysql> stop slave; |
- Step2:模拟主库意外宕机
1 2 3 |
# 在Master执行 [root@master]# killall -9 mysqld mysqld_safe |
- 结果:MHA成功切换,Slave被执行新的Master并成功开始复制
Case5: Master 意外宕掉,Slave正常,备选Slave复制意外停止
- Step:测试步骤与Case4类似,这里忽略不再详述。
- 结果:MHA成功切换,备选Slave成功上位,Slave指向新的Master。
Case6:Master 意外宕掉,Slave正常,备选Slave也意外宕掉
- Step1: 模拟备选Slave意外宕掉
1 2 3 |
# 在备选Slave执行 [root@slave2]# killall -9 mysqld mysqld_safe |
- Step2:模拟Master意外宕掉
1 2 3 |
# 在Master执行 [root@master]# killall -9 mysqld mysqld_safe |
- 结果:MHA切换失败,因为只剩一台Slave,也失去了切换的意义。
Case7:Master too many connection,Slave正常,备选Slave正常
- Step1:模拟Master 链接数过多的情况
1 2 3 4 5 6 7 8 9 |
[root@ master]# mysqlslap --concurrency=4000 --iterations=100 --query='insert into tc(name) select ("nxjisbxxs");' -hlocalhost -uroot -p"xxxxxxx" --socket=/tmp/mysql3306.sock --create-schema=sbtest mysqlslap: [Warning] Using a password on the command line interface can be insecure. mysqlslap: Error when connecting to server: 1040 Too many connections mysqlslap: Error when connecting to server: 1040 Too many connections mysqlslap: Error when connecting to server: 1040 Too many connections mysqlslap: Error when connecting to server: 1040 Too many connections mysqlslap: Error when connecting to server: 1040 Too many connections mysqlslap: Error when connecting to server: 1040 Too many connections mysqlslap: Error when connecting to server: 1040 Too many connections |
- 结果:可以手动触发切换
Case8:Master 正常工作,Slave (或备选Slave)意外宕掉
- Step:测试比较简单,直接干掉Slave观察即可,这里不再详述。
- 结果:MHA无法成功切换,会报出如下错误:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
masterha_master_switch --master_state=alive --global_conf=/etc/mha.cnf --conf=/etc/mha.cnf Sat May 27 16:47:05 2017 - [info] MHA::MasterRotate version 0.56. Sat May 27 16:47:05 2017 - [info] Starting online master switch.. Sat May 27 16:47:05 2017 - [info] Sat May 27 16:47:05 2017 - [info] * Phase 1: Configuration Check Phase.. Sat May 27 16:47:05 2017 - [info] Sat May 27 16:47:05 2017 - [info] Reading default configuration from /etc/mha.cnf.. Sat May 27 16:47:05 2017 - [info] Reading application default configuration from /etc/mha.cnf.. Sat May 27 16:47:05 2017 - [info] Reading server configuration from /etc/mha.cnf.. Sat May 27 16:47:05 2017 - [info] GTID failover mode = 0 Sat May 27 16:47:05 2017 - [error][/usr/share/perl5/vendor_perl/MHA/MasterRotate.pm, ln93] Switching master should not be started if one or more servers is down. Sat May 27 16:47:05 2017 - [info] Dead Servers: Sat May 27 16:47:05 2017 - [info] 10.67.250.2(10.67.250.2:3306) Sat May 27 16:47:05 2017 - [error][/usr/share/perl5/vendor_perl/MHA/ManagerUtil.pm, ln177] Got ERROR: at /usr/bin/masterha_master_switch line 53 |
Case9:Master 有大的事务或写操作在进行,Slave正常,备选Slave也正常
- Step1: 在Master模拟大事务
1 2 3 4 5 6 7 |
mysql> mysql> use test; Database changed mysql> mysql> mysql> mysql> select id , sleep(20) from kobe for update ; |
Step2:在Manager出发MHA主从切换,并观察结果
1 |
masterha_master_switch --master_state=alive --global_conf=/etc/mha.cnf --conf=/etc/mha.cnf |
日志如下 > 如果有大事务,MHA会一直等待,直到事务结束:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 |
Sat May 27 17:03:19 2017 - [info] MHA::MasterRotate version 0.56. Sat May 27 17:03:19 2017 - [info] Starting online master switch.. Sat May 27 17:03:19 2017 - [info] Sat May 27 17:03:19 2017 - [info] * Phase 1: Configuration Check Phase.. Sat May 27 17:03:19 2017 - [info] Sat May 27 17:03:19 2017 - [info] Reading default configuration from /etc/mha.cnf.. Sat May 27 17:03:19 2017 - [info] Reading application default configuration from /etc/mha.cnf.. Sat May 27 17:03:19 2017 - [info] Reading server configuration from /etc/mha.cnf.. Sat May 27 17:03:19 2017 - [info] GTID failover mode = 0 Sat May 27 17:03:19 2017 - [info] Current Alive Master: 10.67.250.1(10.67.250.1:3306) Sat May 27 17:03:19 2017 - [info] Alive Slaves: Sat May 27 17:03:19 2017 - [info] 10.67.250.2(10.67.250.2:3306) Version=5.7.18-14-log (oldest major version between slaves) log-bin:enabled Sat May 27 17:03:19 2017 - [info] Replicating from 10.67.250.1(10.67.250.1:3306) Sat May 27 17:03:19 2017 - [info] 10.67.250.3(10.67.250.3:3306) Version=5.7.18-14-log (oldest major version between slaves) log-bin:enabled Sat May 27 17:03:19 2017 - [info] Replicating from 10.67.250.1(10.67.250.1:3306) Sat May 27 17:03:19 2017 - [info] Primary candidate for the new Master (candidate_master is set) <span style="color: #ff0000;"><strong>It is better to execute FLUSH NO_WRITE_TO_BINLOG TABLES on the master before switching. Is it ok to execute on 10.67.250.1(10.67.250.1:3306)? (YES/no): yes Sat May 27 17:03:21 2017 - [info] Executing FLUSH NO_WRITE_TO_BINLOG TABLES. This may take long time..</strong></span> <strong><span style="color: #ff0000;"># 一直卡在这里直到事务结束才会继续</span></strong> Sat May 27 17:04:35 2017 - [info] ok. Sat May 27 17:04:35 2017 - [info] Checking MHA is not monitoring or doing failover.. Sat May 27 17:04:35 2017 - [info] Checking replication health on 10.67.250.2.. Sat May 27 17:04:35 2017 - [info] ok. Sat May 27 17:04:35 2017 - [info] Checking replication health on 10.67.250.3.. Sat May 27 17:04:35 2017 - [info] ok. Sat May 27 17:04:35 2017 - [info] Searching new master from slaves.. Sat May 27 17:04:35 2017 - [info] Candidate masters from the configuration file: Sat May 27 17:04:35 2017 - [info] 10.67.250.3(10.67.250.3:3306) Version=5.7.18-14-log (oldest major version between slaves) log-bin:enabled Sat May 27 17:04:35 2017 - [info] Replicating from 10.67.250.1(10.67.250.1:3306) Sat May 27 17:04:35 2017 - [info] Primary candidate for the new Master (candidate_master is set) Sat May 27 17:04:35 2017 - [info] Non-candidate masters: Sat May 27 17:04:35 2017 - [info] Searching from candidate_master slaves which have received the latest relay log events.. Sat May 27 17:04:35 2017 - [info] From: 10.67.250.1(10.67.250.1:3306) (current master) +--10.67.250.2(10.67.250.2:3306) +--10.67.250.3(10.67.250.3:3306) To: 10.67.250.3(10.67.250.3:3306) (new master) +--10.67.250.2(10.67.250.2:3306) Starting master switch from 10.67.250.1(10.67.250.1:3306) to 10.67.250.3(10.67.250.3:3306)? (yes/NO): Continue? (yes/NO): |
Case10:有二次检测机制下,如果Manager通过一台Slave无法连接(另一台正常)
- Step1: 模拟Manager无法正常连接Master
1 2 3 |
# 在Master上将Manager的IP地址给禁掉 [root@master ]# iptables -I INPUT -s 10.67.250.2 -j DROP |
- 结果:MHA可以手动出发切换并且成功
7.1 在确认MHA正常运行之后,我直接模拟Master主库宕机,并同时查看MHA的切换日志:
mysqladmin -uroot -pxxxxx shutdown