1.MHA高可用架构
1.1概念
a.MHA是开源MySQL高可用程序,为主从架构提供自动主库故障切换功能,可在30秒内完成切换并保障数据一致性,支持主库在线切换
b.MHA分为Manager和Node角色,Manager通常部署在独立的机器,管理多个主从集群,定时探测主库;Node运行在各MySQL服务器,主库故障时,Manager会将数据最新的从库升为新主库,其他从库重新指向新主库,过程对应用透明
2.实战
2.1环境说明
设备名称 | IP地址 | 操作系统版本/数据库版本 | 说明 |
mysql-master171 | 192.168.75.171/24 | rhel7.9 | 数据库主服务器 |
mysql-master172 | 192.168.75.172/24 | rhel7.9 | 数据库从服务器 |
mysql-master173 | 192.168.75.173/24 | rhel7.9 | 数据库从服务器 |
mysql-mha-manager174 | 192.168.75.174/24 | rhel7.9 | manager控制器 |
2.2环境准备工作
2.2.1MySQL源码部署参考链接,里面有详细说明(manager控制器只需源码部署好即可,其他三台需要设置主从库的关系)
https://blog.csdn.net/2302_77662550/article/details/151225622
2.2.2初始化主库mysql-master171的配置
a.编辑mysql配置文件
[root@mysql-master171 ~]# vim /etc/my.cnf
[mysqld]
datadir=/data/mysql
socket=/data/mysql/mysql.sock
server_id=171 #集群中的各节点的id必须唯一
gtid-mode=on #启用gtid类型
enforce-gtid-consistency=true #强制GTID一致性
log-bin=binlog #开启二进制日志
relay-log=relay-log #开启中继日志
relay_log_purge=0 #是否自动清空不再需要的中继日志
log-slave-updates=true #slave更新的信息是否记入二进制日志中
b.启动MySQL
[root@mysql-master171 ~]# /etc/init.d/mysqld start
2.2.3初始化mysql-master172和mysql-master173的配置
初始化msyql-master172的配置
a.编辑mysql配置文件
[root@mysql-master172 ~]# vim /etc/my.cnf
[mysqld]
datadir=/data/mysql
socket=/data/mysql/mysql.sock
server_id=172 #集群中的各节点的id必须唯一
gtid-mode=on #启用gtid类型
enforce-gtid-consistency=true #强制GTID一致性
log-bin=binlog #开启二进制日志
relay-log=relay-log #开启中继日志
relay_log_purge=0 #是否自动清空不再需要的中继日志
log-slave-updates=true #slave更新的信息是否记入二进制日志中
b.启动MySQL
[root@mysql-master172 ~]# /etc/init.d/mysqld start
初始化mysql-master173的配置
a.编辑mysql配置文件
[root@mysql-master173 ~]# vim /etc/my.cnf
[mysqld]
datadir=/data/mysql
socket=/data/mysql/mysql.sock
server_id=173 #集群中的各节点的id必须唯一
gtid-mode=on #启用gtid类型
enforce-gtid-consistency=true #强制GTID一致性
log-bin=binlog #开启二进制日志
relay-log=relay-log #开启中继日志
relay_log_purge=0 #是否自动清空不再需要的中继日志
log-slave-updates=true #slave更新的信息是否记入二进制日志中
b.启动MySQL
[root@mysql-master173 ~]# /etc/init.d/mysqld start
2.2.4配置一主多从复制架构
主库mysql-master171配置
a.创建一个名为rep的MySQL用户,允许任何主机登录,密码为123
mysql> create user 'rep'@'%' identified WITH mysql_native_password by '123';
b.给MySQL用户rep一个主从复制的权限
mysql> grant replication slave on *.* to 'rep'@'%';
注意:从库不需要执行这两条命令,因为从库同步主库的二进制日志(binlog),并执行日志种,而上面两个指令会并记录,所以从库会自动创建rep用户,为rep用户授予复制从库的权限
从库mysql-master172配置
a.配置从库到主库的连接信息和复制参数
mysql> CHANGE REPLICATION SOURCE TO
SOURCE_HOST='192.168.75.171',
SOURCE_USER='rep',
SOURCE_PASSWORD='123',
master_auto_position=1,
SOURCE_SSL=1;
b.启动复制进程
mysql> start replica;
c.查看复制状态
mysql> show replica status \G
从库mysql-master173配置
a.配置从库到主库的连接信息和复制参数
mysql> CHANGE REPLICATION SOURCE TO
SOURCE_HOST='192.168.75.171',
SOURCE_USER='rep',
SOURCE_PASSWORD='123',
master_auto_position=1,
SOURCE_SSL=1;
b.启动复制进程
mysql> start replica;
c.查看复制状态
mysql> show replica status \G
2.3配置MHA
2.3.1安装MHA包,下载地址
https://code.google.com/archive/p/mysql-master-ha/
mysql-mha-manager174上安装软件包(下载不下来通过我的资源包去下)
通过xftp上传
2.3.2安装解压软件
[root@mysql-mha-manager174 ~]# yum install unzip -y
2.3.3解压MHA软件
[root@mysql-mha-manager174 ~]# unzip MHA-7.zip
2.3.4安装本地已下载的RPM包
[root@mha-manager ~]# cd MHA-7/
[root@mha-manager MHA-7]# yum localinstall *.rpm
2.3.5所有主机要配置域名解析(一定要做!!!)
[root@mysql-mha-manager174 ~]# vim /etc/hosts
127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4
::1 localhost localhost.localdomain localhost6 localhost6.localdomain6
192.168.75.171 mysql-master171
192.168.75.172 mysql-master172
192.168.75.173 mysql-master173
192.168.75.174 mysql-mha-manager174
2.3.6通过mysql-mha-manager174传mha4mysql-node到其他三台数据库服务器并安装
#需要输入各主机的密码及yes
[root@mysql-mha-manager174 MHA-7]# for i in {mysql-master171,mysql-master172,mysql-master173}; do scp mha4mysql-node-0.58-0.el7.centos.noarch.rpm root@$i:/root ; done
#另外三台数据库上安装mha4mysql-node
[root@master ~]# yum localinstall mha4mysql-node-0.58-0.el7.centos.noarch.rpm
2.3.7在mysql-master171上设置可以访问mysql的用户,并设置管理员权限
mysql> create user 'mhaadm'@'%' identified WITH mysql_native_password by '123';
mysql> grant all on *.* to 'mhaadm'@'%';
注意:必须在主库上运行,会通过binlog同步到所有从库,确保MHA集群权限一致性
2.3.8所有主机设置远程控制(MHA集群中的主机都需要基于ssh相互通信,实现远程控制及数据管理功能)
第一种方法:
a.所有主机生成公私钥
[root@mysql-mha-manager174 ~]# ssh-keygen -f /root/.ssh/id_rsa -P '' -q
b.将所有主机的公钥上传到mysql-mha-manager主机
[root@mysql-master171 ~]# ssh-copy-id root@mysql-mha-manager174
[root@mysql-master172 ~]# ssh-copy-id root@mysql-mha-manager174
[root@mysql-master173 ~]# ssh-copy-id root@mysql-mha-manager174
[root@mysql-mha-manager174 ~]# ssh-copy-id root@mysql-mha-manager174
c.将manager主机的authorized_keys分发到其他主机
[root@mysql-mha-manager174 ~]# for i in {mysql-master171,mysql-master172,mysql-master173};do scp /root/.ssh/authorized_keys root@$i:/root/.ssh;done
d.检查是否配置成功
[root@mysql-mha-manager174 ~]# for i in {mysql-master171,mysql-master172,mysql-master173};do ssh $i hostname;done
mysql-master171
mysql-master172
mysql-master173
第二种方法:
#批量传递管理节点公钥
[root@mysql-mha-manager174 ~]# for i in {mysql-master171,mysql-master172,mysql-master173}; do ssh-copy-id root@$i ; done
#批量传递私钥
[root@mysql-mha-manager174 ~]# for i in {mysql-master171,mysql-master172,mysql-master173}; do scp /root/.ssh/id_rsa root@$i:/root/.ssh; done
#批量验证免密连通性
[root@mysql-mha-manager174 ~]# for i in {mysql-master171,mysql-master172,mysql-master173};do ssh $i hostname ;done
2.3.9在mysql-mha-manager174上创建配置文件目录
#/etc/mha用于存放MHA的主配置文件
#/var/log/mha/app1是MHA的日志存储路径
[root@mysql-mha-manager174 ~]# mkdir -p /etc/mha /var/log/mha/app1
2.3.10编辑MHA集群核心配置文件
[root@mysql-mha-manager174 ~]# vim /etc/mha/app1.cnf
[server default] #适用于server1,2,3个server的配置
user=mhaadm #mha管理用户
password=123 #mha管理用户密码
manager_workdir=/var/log/mha/app1 #mha的工作路径
manager_log=/var/log/mha/app1/manager.log #mha的日志文件
ssh_user=root #基于ssh的秘钥认证
repl_user=rep #主从复制的账号
repl_password=123
ping_interval=1 #ping间隔时长
[server1] #mysql主机信息
hostname=192.168.75.171
ssh_port=22
candidate_master=1 #设置该主机将来可能成为master候选节点
[server2]
hostname=192.168.75.172
ssh_port=22
candidate_master=1
[server3]
hostname=192.168.75.173
ssh_port=22
no_master=1
a.检测各主机间ssh相互通信配置是否ok
[root@mysql-mha-manager174 ~]# masterha_check_repl --conf=/etc/mha/app1.cnf
b.检测结果
[root@mha-manager ~]# masterha_check_ssh --conf=/etc/mha/app.cnf
Sat Sep 6 01:56:49 2025 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Sat Sep 6 01:56:49 2025 - [info] Reading application default configuration from /etc/mha/app.cnf..
Sat Sep 6 01:56:49 2025 - [info] Reading server configuration from /etc/mha/app.cnf..
Sat Sep 6 01:56:49 2025 - [info] Starting SSH connection tests..
Sat Sep 6 01:56:49 2025 - [debug]
Sat Sep 6 01:56:49 2025 - [debug] Connecting via SSH from root@192.168.75.171(192.168.75.171:22) to root@192.168.75.172(192.168.75.172:22)..
Sat Sep 6 01:56:49 2025 - [debug] ok.
Sat Sep 6 01:56:49 2025 - [debug] Connecting via SSH from root@192.168.75.171(192.168.75.171:22) to root@192.168.75.173(192.168.75.173:22)..
Sat Sep 6 01:56:49 2025 - [debug] ok.
Sat Sep 6 01:56:50 2025 - [debug]
Sat Sep 6 01:56:49 2025 - [debug] Connecting via SSH from root@192.168.75.172(192.168.75.172:22) to root@192.168.75.171(192.168.75.171:22)..
Sat Sep 6 01:56:49 2025 - [debug] ok.
Sat Sep 6 01:56:49 2025 - [debug] Connecting via SSH from root@192.168.75.172(192.168.75.172:22) to root@192.168.75.173(192.168.75.173:22)..
Sat Sep 6 01:56:50 2025 - [debug] ok.
Sat Sep 6 01:56:51 2025 - [debug]
Sat Sep 6 01:56:50 2025 - [debug] Connecting via SSH from root@192.168.75.173(192.168.75.173:22) to root@192.168.75.171(192.168.75.171:22)..
Sat Sep 6 01:56:50 2025 - [debug] ok.
Sat Sep 6 01:56:50 2025 - [debug] Connecting via SSH from root@192.168.75.173(192.168.75.173:22) to root@192.168.75.172(192.168.75.172:22)..
Sat Sep 6 01:56:50 2025 - [debug] ok.
Sat Sep 6 01:56:51 2025 - [info] All SSH connection tests passed successfully.
[root@mha-manager ~]# masterha_check_repl --conf=/etc/mha/app1.cnf
Sat Sep 6 01:57:04 2025 - [error][/usr/share/perl5/vendor_perl/MHA/MasterMonitor.pm, ln283] Configuration file /etc/mha/app1.cnf not found!
Sat Sep 6 01:57:04 2025 - [error][/usr/share/perl5/vendor_perl/MHA/MasterMonitor.pm, ln427] Error happened on checking configurations. at /usr/bin/masterha_check_repl line 48.
Sat Sep 6 01:57:04 2025 - [error][/usr/share/perl5/vendor_perl/MHA/MasterMonitor.pm, ln525] Error happened on monitoring servers.
Sat Sep 6 01:57:04 2025 - [info] Got exit code 1 (Not master dead).
MySQL Replication Health is NOT OK!
c.检测mysql复制集群的连接配置参数是否ok
[root@mysql-mha-manager174 ~]# masterha_check_repl --conf=/etc/mha/app1.cnf
d.检测结果
[root@mysql-mha-manager174 ~]# masterha_check_repl --conf=/etc/mha/app.cnf
Sat Sep 6 01:57:12 2025 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Sat Sep 6 01:57:12 2025 - [info] Reading application default configuration from /etc/mha/app.cnf..
Sat Sep 6 01:57:12 2025 - [info] Reading server configuration from /etc/mha/app.cnf..
Sat Sep 6 01:57:12 2025 - [info] MHA::MasterMonitor version 0.58.
Sat Sep 6 01:57:13 2025 - [info] GTID failover mode = 1
Sat Sep 6 01:57:13 2025 - [info] Dead Servers:
Sat Sep 6 01:57:13 2025 - [info] Alive Servers:
Sat Sep 6 01:57:13 2025 - [info] 192.168.75.171(192.168.75.171:3306)
Sat Sep 6 01:57:13 2025 - [info] 192.168.75.172(192.168.75.172:3306)
Sat Sep 6 01:57:13 2025 - [info] 192.168.75.173(192.168.75.173:3306)
Sat Sep 6 01:57:13 2025 - [info] Alive Slaves:
Sat Sep 6 01:57:13 2025 - [info] 192.168.75.172(192.168.75.172:3306) Version=8.0.40 (oldest major version between slaves) log-bin:enabled
Sat Sep 6 01:57:13 2025 - [info] GTID ON
Sat Sep 6 01:57:13 2025 - [info] Replicating from 192.168.75.171(192.168.75.171:3306)
Sat Sep 6 01:57:13 2025 - [info] Primary candidate for the new Master (candidate_master is set)
Sat Sep 6 01:57:13 2025 - [info] 192.168.75.173(192.168.75.173:3306) Version=8.0.40 (oldest major version between slaves) log-bin:enabled
Sat Sep 6 01:57:13 2025 - [info] GTID ON
Sat Sep 6 01:57:13 2025 - [info] Replicating from 192.168.75.171(192.168.75.171:3306)
Sat Sep 6 01:57:13 2025 - [info] Not candidate for the new Master (no_master is set)
Sat Sep 6 01:57:13 2025 - [info] Current Alive Master: 192.168.75.171(192.168.75.171:3306)
Sat Sep 6 01:57:13 2025 - [info] Checking slave configurations..
Sat Sep 6 01:57:13 2025 - [info] read_only=1 is not set on slave 192.168.75.172(192.168.75.172:3306).
Sat Sep 6 01:57:13 2025 - [info] read_only=1 is not set on slave 192.168.75.173(192.168.75.173:3306).
Sat Sep 6 01:57:13 2025 - [info] Checking replication filtering settings..
Sat Sep 6 01:57:13 2025 - [info] binlog_do_db= , binlog_ignore_db=
Sat Sep 6 01:57:13 2025 - [info] Replication filtering check ok.
Sat Sep 6 01:57:13 2025 - [info] GTID (with auto-pos) is supported. Skipping all SSH and Node package checking.
Sat Sep 6 01:57:13 2025 - [info] Checking SSH publickey authentication settings on the current master..
Sat Sep 6 01:57:13 2025 - [info] HealthCheck: SSH to 192.168.75.171 is reachable.
Sat Sep 6 01:57:13 2025 - [info]
192.168.75.171(192.168.75.171:3306) (current master)
+--192.168.75.172(192.168.75.172:3306)
+--192.168.75.173(192.168.75.173:3306)
Sat Sep 6 01:57:13 2025 - [info] Checking replication health on 192.168.75.172..
Sat Sep 6 01:57:13 2025 - [info] ok.
Sat Sep 6 01:57:13 2025 - [info] Checking replication health on 192.168.75.173..
Sat Sep 6 01:57:13 2025 - [info] ok.
Sat Sep 6 01:57:13 2025 - [warning] master_ip_failover_script is not defined.
Sat Sep 6 01:57:13 2025 - [warning] shutdown_script is not defined.
Sat Sep 6 01:57:13 2025 - [info] Got exit code 0 (Not master dead).
MySQL Replication Health is OK.
3.MHA的故障切换
3.1概念
MHA故障切换主库故障时,Manager检测到后,自动将数据最新的从库升为新主库,同时其他从库指向新主库,并保证数据的一致性,MHA故障手动切换比较繁琐,MHA故障自动切换比较方便
3.2master故障手动切换
3.2.1模拟mysql-master171故障
[root@mysql-master171]# /etc/init.d/mysqld stop
3.2.2mysql-mha-manager中做故障切换(--ignore_last_failover 表示忽略在/var/log/mha/app1/目录中在切换过程生成的锁文件)
[root@mysql-mha-manager174 ~]# masterha_master_switch --master_state=dead --conf=/etc/mha/app1.cnf --dead_master_host=192.168.75.171 -- dead_master_port=3306 --new_master_host=192.168.75.172 --new_master_port=3306 --ignore_last_failover
#masterha_master_switch 用于手动/自动切换
#--master_state=dead 表示原主库状态已故障
#--conf=/etc/mha/app1.cnf 指定MHA集群配置文件路径,MHA从该文件读取节点信息,权限等配置
#--dead_master_host=192.168.75.171 指定已故障的原主库
#--dead_master_port=3306 原主库的mysql端口
#--new_master_host=192.168.75.172 指定要提升新主库的ip
#--new_master_port=3306 新主库的mysql端口
#--ignore_last_failover 忽略上次故障切换的冷却时间
测试结果(mysql-master172已经变成新的主库)
[root@mysql-mha-manager174 ~]# masterha_master_switch --master_state=dead --conf=/etc/mha/app.cnf --dead_master_host=192.168.75.171 -- dead_master_port=3306 --new_master_host=192.168.75.172 --new_master_port=3306 --ignore_last_failover
--dead_master_ip=<dead_master_ip> is not set. Using 192.168.75.171.
--dead_master_port=<dead_master_port> is not set. Using 3306.
Sat Sep 6 02:07:01 2025 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Sat Sep 6 02:07:01 2025 - [info] Reading application default configuration from /etc/mha/app.cnf..
Sat Sep 6 02:07:01 2025 - [info] Reading server configuration from /etc/mha/app.cnf..
Sat Sep 6 02:07:01 2025 - [info] MHA::MasterFailover version 0.58.
Sat Sep 6 02:07:01 2025 - [info] Starting master failover.
Sat Sep 6 02:07:01 2025 - [info]
Sat Sep 6 02:07:01 2025 - [info] * Phase 1: Configuration Check Phase..
Sat Sep 6 02:07:01 2025 - [info]
Sat Sep 6 02:07:02 2025 - [info] GTID failover mode = 1
Sat Sep 6 02:07:02 2025 - [info] Dead Servers:
Sat Sep 6 02:07:02 2025 - [info] 192.168.75.171(192.168.75.171:3306)
Sat Sep 6 02:07:02 2025 - [info] Checking master reachability via MySQL(double check)...
Sat Sep 6 02:07:02 2025 - [info] ok.
Sat Sep 6 02:07:02 2025 - [info] Alive Servers:
Sat Sep 6 02:07:02 2025 - [info] 192.168.75.172(192.168.75.172:3306)
Sat Sep 6 02:07:02 2025 - [info] 192.168.75.173(192.168.75.173:3306)
Sat Sep 6 02:07:02 2025 - [info] Alive Slaves:
Sat Sep 6 02:07:02 2025 - [info] 192.168.75.172(192.168.75.172:3306) Version=8.0.40 (oldest major version between slaves) log-bin:enabled
Sat Sep 6 02:07:02 2025 - [info] GTID ON
Sat Sep 6 02:07:02 2025 - [info] Replicating from 192.168.75.171(192.168.75.171:3306)
Sat Sep 6 02:07:02 2025 - [info] Primary candidate for the new Master (candidate_master is set)
Sat Sep 6 02:07:02 2025 - [info] 192.168.75.173(192.168.75.173:3306) Version=8.0.40 (oldest major version between slaves) log-bin:enabled
Sat Sep 6 02:07:02 2025 - [info] GTID ON
Sat Sep 6 02:07:02 2025 - [info] Replicating from 192.168.75.171(192.168.75.171:3306)
Sat Sep 6 02:07:02 2025 - [info] Not candidate for the new Master (no_master is set)
Master 192.168.75.171(192.168.75.171:3306) is dead. Proceed? (yes/NO): yes
Sat Sep 6 02:07:12 2025 - [info] Starting GTID based failover.
Sat Sep 6 02:07:12 2025 - [info]
Sat Sep 6 02:07:12 2025 - [info] ** Phase 1: Configuration Check Phase completed.
Sat Sep 6 02:07:12 2025 - [info]
Sat Sep 6 02:07:12 2025 - [info] * Phase 2: Dead Master Shutdown Phase..
Sat Sep 6 02:07:12 2025 - [info]
Sat Sep 6 02:07:12 2025 - [info] HealthCheck: SSH to 192.168.75.171 is reachable.
Sat Sep 6 02:07:12 2025 - [info] Forcing shutdown so that applications never connect to the current master..
Sat Sep 6 02:07:12 2025 - [warning] master_ip_failover_script is not set. Skipping invalidating dead master IP address.
Sat Sep 6 02:07:12 2025 - [warning] shutdown_script is not set. Skipping explicit shutting down of the dead master.
Sat Sep 6 02:07:12 2025 - [info] * Phase 2: Dead Master Shutdown Phase completed.
Sat Sep 6 02:07:12 2025 - [info]
Sat Sep 6 02:07:12 2025 - [info] * Phase 3: Master Recovery Phase..
Sat Sep 6 02:07:12 2025 - [info]
Sat Sep 6 02:07:12 2025 - [info] * Phase 3.1: Getting Latest Slaves Phase..
Sat Sep 6 02:07:12 2025 - [info]
Sat Sep 6 02:07:12 2025 - [info] The latest binary log file/position on all slaves is binlog.000003:197
Sat Sep 6 02:07:12 2025 - [info] Retrieved Gtid Set: c109ec67-896c-11f0-90d5-000c294df9ee:1-7
Sat Sep 6 02:07:12 2025 - [info] Latest slaves (Slaves that received relay log files to the latest):
Sat Sep 6 02:07:12 2025 - [info] 192.168.75.172(192.168.75.172:3306) Version=8.0.40 (oldest major version between slaves) log-bin:enabled
Sat Sep 6 02:07:12 2025 - [info] GTID ON
Sat Sep 6 02:07:12 2025 - [info] Replicating from 192.168.75.171(192.168.75.171:3306)
Sat Sep 6 02:07:12 2025 - [info] Primary candidate for the new Master (candidate_master is set)
Sat Sep 6 02:07:12 2025 - [info] 192.168.75.173(192.168.75.173:3306) Version=8.0.40 (oldest major version between slaves) log-bin:enabled
Sat Sep 6 02:07:12 2025 - [info] GTID ON
Sat Sep 6 02:07:12 2025 - [info] Replicating from 192.168.75.171(192.168.75.171:3306)
Sat Sep 6 02:07:12 2025 - [info] Not candidate for the new Master (no_master is set)
Sat Sep 6 02:07:12 2025 - [info] The oldest binary log file/position on all slaves is binlog.000003:197
Sat Sep 6 02:07:12 2025 - [info] Retrieved Gtid Set: c109ec67-896c-11f0-90d5-000c294df9ee:1-7
Sat Sep 6 02:07:12 2025 - [info] Oldest slaves:
Sat Sep 6 02:07:12 2025 - [info] 192.168.75.172(192.168.75.172:3306) Version=8.0.40 (oldest major version between slaves) log-bin:enabled
Sat Sep 6 02:07:12 2025 - [info] GTID ON
Sat Sep 6 02:07:12 2025 - [info] Replicating from 192.168.75.171(192.168.75.171:3306)
Sat Sep 6 02:07:12 2025 - [info] Primary candidate for the new Master (candidate_master is set)
Sat Sep 6 02:07:12 2025 - [info] 192.168.75.173(192.168.75.173:3306) Version=8.0.40 (oldest major version between slaves) log-bin:enabled
Sat Sep 6 02:07:12 2025 - [info] GTID ON
Sat Sep 6 02:07:12 2025 - [info] Replicating from 192.168.75.171(192.168.75.171:3306)
Sat Sep 6 02:07:12 2025 - [info] Not candidate for the new Master (no_master is set)
Sat Sep 6 02:07:12 2025 - [info]
Sat Sep 6 02:07:12 2025 - [info] * Phase 3.3: Determining New Master Phase..
Sat Sep 6 02:07:12 2025 - [info]
Sat Sep 6 02:07:12 2025 - [info] Searching new master from slaves..
Sat Sep 6 02:07:12 2025 - [info] Candidate masters from the configuration file:
Sat Sep 6 02:07:12 2025 - [info] 192.168.75.172(192.168.75.172:3306) Version=8.0.40 (oldest major version between slaves) log-bin:enabled
Sat Sep 6 02:07:12 2025 - [info] GTID ON
Sat Sep 6 02:07:12 2025 - [info] Replicating from 192.168.75.171(192.168.75.171:3306)
Sat Sep 6 02:07:12 2025 - [info] Primary candidate for the new Master (candidate_master is set)
Sat Sep 6 02:07:12 2025 - [info] Non-candidate masters:
Sat Sep 6 02:07:12 2025 - [info] 192.168.75.173(192.168.75.173:3306) Version=8.0.40 (oldest major version between slaves) log-bin:enabled
Sat Sep 6 02:07:12 2025 - [info] GTID ON
Sat Sep 6 02:07:12 2025 - [info] Replicating from 192.168.75.171(192.168.75.171:3306)
Sat Sep 6 02:07:12 2025 - [info] Not candidate for the new Master (no_master is set)
Sat Sep 6 02:07:12 2025 - [info] Searching from candidate_master slaves which have received the latest relay log events..
Sat Sep 6 02:07:12 2025 - [info] New master is 192.168.75.172(192.168.75.172:3306)
Sat Sep 6 02:07:12 2025 - [info] Starting master failover..
Sat Sep 6 02:07:12 2025 - [info]
From:
192.168.75.171(192.168.75.171:3306) (current master)
+--192.168.75.172(192.168.75.172:3306)
+--192.168.75.173(192.168.75.173:3306)
To:
192.168.75.172(192.168.75.172:3306) (new master)
+--192.168.75.173(192.168.75.173:3306)
Starting master switch from 192.168.75.171(192.168.75.171:3306) to 192.168.75.172(192.168.75.172:3306)? (yes/NO): yes
Sat Sep 6 02:07:14 2025 - [info] New master decided manually is 192.168.75.172(192.168.75.172:3306)
Sat Sep 6 02:07:14 2025 - [info]
Sat Sep 6 02:07:14 2025 - [info] * Phase 3.3: New Master Recovery Phase..
Sat Sep 6 02:07:14 2025 - [info]
Sat Sep 6 02:07:14 2025 - [info] Waiting all logs to be applied..
Sat Sep 6 02:07:14 2025 - [info] done.
Sat Sep 6 02:07:14 2025 - [info] Getting new master's binlog name and position..
Sat Sep 6 02:07:14 2025 - [info] binlog.000003:1141
Sat Sep 6 02:07:14 2025 - [info] All other slaves should start replication from here. Statement should be: CHANGE MASTER TO MASTER_HOST='192.168.75.172', MASTER_PORT=3306, MASTER_AUTO_POSITION=1, MASTER_USER='rep', MASTER_PASSWORD='xxx';
Sat Sep 6 02:07:14 2025 - [info] Master Recovery succeeded. File:Pos:Exec_Gtid_Set: binlog.000003, 1141, 3ea421e4-896e-11f0-988a-000c29c5caee:1,
c109ec67-896c-11f0-90d5-000c294df9ee:1-7
Sat Sep 6 02:07:14 2025 - [warning] master_ip_failover_script is not set. Skipping taking over new master IP address.
Sat Sep 6 02:07:14 2025 - [info] ** Finished master recovery successfully.
Sat Sep 6 02:07:14 2025 - [info] * Phase 3: Master Recovery Phase completed.
Sat Sep 6 02:07:14 2025 - [info]
Sat Sep 6 02:07:14 2025 - [info] * Phase 4: Slaves Recovery Phase..
Sat Sep 6 02:07:14 2025 - [info]
Sat Sep 6 02:07:14 2025 - [info]
Sat Sep 6 02:07:14 2025 - [info] * Phase 4.1: Starting Slaves in parallel..
Sat Sep 6 02:07:14 2025 - [info]
Sat Sep 6 02:07:14 2025 - [info] -- Slave recovery on host 192.168.75.173(192.168.75.173:3306) started, pid: 2914. Check tmp log /var/log/mha/app1/192.168.75.173_3306_20250906020701.log if it takes time..
Sat Sep 6 02:07:16 2025 - [info]
Sat Sep 6 02:07:16 2025 - [info] Log messages from 192.168.75.173 ...
Sat Sep 6 02:07:16 2025 - [info]
Sat Sep 6 02:07:14 2025 - [info] Resetting slave 192.168.75.173(192.168.75.173:3306) and starting replication from the new master 192.168.75.172(192.168.75.172:3306)..
Sat Sep 6 02:07:14 2025 - [info] Executed CHANGE MASTER.
Sat Sep 6 02:07:15 2025 - [info] Slave started.
Sat Sep 6 02:07:15 2025 - [info] gtid_wait(3ea421e4-896e-11f0-988a-000c29c5caee:1,
c109ec67-896c-11f0-90d5-000c294df9ee:1-7) completed on 192.168.75.173(192.168.75.173:3306). Executed 0 events.
Sat Sep 6 02:07:16 2025 - [info] End of log messages from 192.168.75.173.
Sat Sep 6 02:07:16 2025 - [info] -- Slave on host 192.168.75.173(192.168.75.173:3306) started.
Sat Sep 6 02:07:16 2025 - [info] All new slave servers recovered successfully.
Sat Sep 6 02:07:16 2025 - [info]
Sat Sep 6 02:07:16 2025 - [info] * Phase 5: New master cleanup phase..
Sat Sep 6 02:07:16 2025 - [info]
Sat Sep 6 02:07:16 2025 - [info] Resetting slave info on the new master..
Sat Sep 6 02:07:16 2025 - [info] 192.168.75.172: Resetting slave info succeeded.
Sat Sep 6 02:07:16 2025 - [info] Master failover to 192.168.75.172(192.168.75.172:3306) completed successfully.
Sat Sep 6 02:07:16 2025 - [info]
----- Failover Report -----
app: MySQL Master failover 192.168.75.171(192.168.75.171:3306) to 192.168.75.172(192.168.75.172:3306) succeeded
Master 192.168.75.171(192.168.75.171:3306) is down!
Check MHA Manager logs at mysql-mha-manager174 for details.
Started manual(interactive) failover.
Selected 192.168.75.172(192.168.75.172:3306) as a new master.
192.168.75.172(192.168.75.172:3306): OK: Applying all logs succeeded.
192.168.75.173(192.168.75.173:3306): OK: Slave started, replicating from 192.168.75.172(192.168.75.172:3306)
192.168.75.172(192.168.75.172:3306): Resetting slave info succeeded.
Master failover to 192.168.75.172(192.168.75.172:3306) completed successfully.
3.2.3恢复故障mysql节点
a.将mysql-master171主机重新加入集群中
[root@mysql-master171 ~]# /etc/init.d/mysqld start
Starting MySQL. SUCCESS!
[root@mysql-master171 ~]# mysql -uroot -p123
mysql> CHANGE REPLICATION SOURCE TO
-> SOURCE_HOST='192.168.75.172',
-> SOURCE_USER='rep',
-> SOURCE_PASSWORD='123',
-> SOURCE_AUTO_POSITION=1,
-> SOURCE_SSL=1;
Query OK, 0 rows affected, 2 warnings (0.07 sec)
b.启动复制进程
mysql> start replica;
Query OK, 0 rows affected (0.01 sec)
c.查看复制状态
mysql> show replica status\G
*************************** 1. row ***************************
Replica_IO_State: Waiting for source to send event
Source_Host: 192.168.75.172 #此时的主库为mysql-master172
Source_User: rep
Source_Port: 3306
Connect_Retry: 60
Source_Log_File: binlog.000004
Read_Source_Log_Pos: 237
Relay_Log_File: relay-log.000002
Relay_Log_Pos: 367
Relay_Source_Log_File: binlog.000004
Replica_IO_Running: Yes
Replica_SQL_Running: Yes
3.2.4启动mha,使mysql故障后自动切换
a.删除锁文件
#如果不删除的话mha无法故障转移成功
#使用--ignore_last_failover来忽略锁文件
[root@mysql-mha-manager174 ~]# rm -rf /var/log/mha/app1/app.failover.complete
b.启动mha
[root@mha-manager174 ~]# masterha_manager --conf=/etc/mha/app1.cnf
c.查看mha的状态
[root@mysql-mha-manager174 ~]# masterha_check_status --conf=/etc/mha/app1.cnf
d.模拟主mysql-master172宕机
[root@mysql-master172 ~]# /etc/init.d/mysqld stop
注意:故障转移完成后,mysql-mha-manager将会自动停止运行
e.查看转移日志
[root@mysql-mha-manager174 ~]# cat /var/log/mha/app1/manager.log
f.在mysql-master173查看现在的主库是谁(此时主库已经切换到mysql-master171)
mysql> show replica status \G
*************************** 1. row ***************************
Replica_IO_State: Waiting for source to send event
Source_Host: 192.168.75.171
Source_User: rep
Source_Port: 3306
Connect_Retry: 60
Source_Log_File: binlog.000004
Read_Source_Log_Pos: 524
Relay_Log_File: relay-log.000002
Relay_Log_Pos: 411
Relay_Source_Log_File: binlog.000004
Replica_IO_Running: Yes
Replica_SQL_Running: Yes
g.删除锁文件
[root@mysql-mha-manager174 ~]# rm -rf /var/log/mha/app1/app1.failover.complete
h.恢复故障的mysql-master172
启动mysql-master172
[root@mysql-master172 ~]# /etc/init.d/mysqld start
Starting MySQL. SUCCESS!
设置从库
[root@mysql-master172 ~]# mysql -uroot -p123
mysql> CHANGE REPLICATION SOURCE TO
-> SOURCE_HOST='192.168.75.171',
-> SOURCE_USER='rep',
-> SOURCE_PASSWORD='123',
-> SOURCE_AUTO_POSITION=1,
-> SOURCE_SSL=1;
Query OK, 0 rows affected, 2 warnings (0.06 sec)
启动复制线程
mysql> start replica;
Query OK, 0 rows affected (0.02 sec)
4.配置VIP
4.1概念
vip配置有两种方式:keepalived管理虚拟IP浮动,或者脚本启动虚拟的IP,为了防止脑裂,生产环境推荐脚本的方式,不需要keepalived等软件,更加可靠
4.2配置文件
4.2.1编写脚本
a.脚本放到/usr/local/bin/底下
[root@mysql-mha-manager174 ~]# vim /usr/local/bin/master_ip_failover
#!/usr/bin/env perl
use strict;
use warnings FATAL => 'all';
use Getopt::Long;
my (
$command, $ssh_user, $orig_master_host, $orig_master_ip,
$orig_master_port, $new_master_host, $new_master_ip, $new_master_port
);
#注意此处配置的ip地址和网卡名称
my $vip = '192.168.75.88/24';
my $ssh_start_vip = "/sbin/ip a add $vip dev eth0";
my $ssh_stop_vip = "/sbin/ip a del $vip dev eth0";
GetOptions(
'command=s' => \$command,
'ssh_user=s' => \$ssh_user,
'orig_master_host=s' => \$orig_master_host,
'orig_master_ip=s' => \$orig_master_ip,
'orig_master_port=i' => \$orig_master_port,
'new_master_host=s' => \$new_master_host,
'new_master_ip=s' => \$new_master_ip,
'new_master_port=i' => \$new_master_port,
);
exit &main();
sub main {
print "\n\nIN SCRIPT TEST====$ssh_stop_vip==$ssh_start_vip===\n\n";
if ( $command eq "stop" || $command eq "stopssh" ) {
my $exit_code = 1;
eval {
print "Disabling the VIP on old master: $orig_master_host \n";
&stop_vip();
$exit_code = 0;
};
if ($@) {
warn "Got Error: $@\n";
exit $exit_code;
}
exit $exit_code;
}
elsif ( $command eq "start" ) {
my $exit_code = 10;
eval {
print "Enabling the VIP - $vip on the new master - $new_master_host
\n";
&start_vip();
$exit_code = 0;
};
if ($@) {
warn $@;
exit $exit_code;
}
exit $exit_code;
}
elsif ( $command eq "status" ) {
print "Checking the Status of the script.. OK \n";
exit 0;
}
else {
&usage();
exit 1;
}
}
sub start_vip() {
`ssh $ssh_user\@$new_master_host \" $ssh_start_vip \"`;
}
sub stop_vip() {
return 0 unless ($ssh_user);
`ssh $ssh_user\@$orig_master_host \" $ssh_stop_vip \"`;
}
sub usage {
print "Usage: master_ip_failover --command=start|stop|stopssh|status --orig_master_host=host --orig_master_ip=ip --orig_master_port=port --new_master_host=host --new_master_ip=ip --new_master_port=port\n";
}
b.设置执行权限
[root@mysql-mha-manager174 ~]# chmod +x /usr/local/bin/master_ip_failover
4.2.2更改mysql-mha-master174配置文件
#在server default下面添加如下信息
[root@mha-manager ~]# vim /etc/mha/app1.cnf
[server default]
master_ip_failover_script=/usr/local/bin/master_ip_failover #用于指定主库故障切换时管理虚拟ip的脚本路径
4.2.3在mysql-master171上手动配置第一次VIP地址
[root@mysql-master171]# ip a a 192.168.75.88/24 dev eth0
[root@mysql-master171]# ip a show eth0
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
link/ether 00:0c:29:d6:ad:c7 brd ff:ff:ff:ff:ff:ff
inet 192.168.75.171/24 brd 192.168.75.255 scope global noprefixroute eth0
valid_lft forever preferred_lft forever
inet 192.168.75.88/24 scope global secondary eth0
4.2.4在mysql-mha-master174上启动MHA(启动了这个需要在打开一个终端)
[root@mysql-mha-manager174 ~]# masterha_manager --conf=/etc/mha/app1.cnf --ignore_last_failover
4.2.5在mysql-master171关闭mysql服务模拟主库数据崩溃
[root@mysql-master171]# /etc/init.d/mysqld stop
4.2.6测试
a.在mysql-master172查看VIP
[root@mysql-master172 ~]# ip a show eth0
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
link/ether 00:0c:29:c5:ca:ee brd ff:ff:ff:ff:ff:ff
inet 192.168.75.172/24 brd 192.168.75.255 scope global noprefixroute eth0
valid_lft forever preferred_lft forever
inet 192.168.75.88/24 scope global secondary eth0
valid_lft forever preferred_lft forever
inet6 fe80::75c2:222:6e68:3095/64 scope link tentative noprefixroute dadfailed
valid_lft forever preferred_lft forever
inet6 fe80::269a:d353:bdad:ccfc/64 scope link noprefixroute
valid_lft forever preferred_lft forever