MySQL集群高可用架构(MHA高可用架构)

发布于:2025-09-07 ⋅ 阅读:(16) ⋅ 点赞:(0)

1.MHA高可用架构

1.1概念

a.MHA是开源MySQL高可用程序,为主从架构提供自动主库故障切换功能,可在30秒内完成切换并保障数据一致性,支持主库在线切换

b.MHA分为Manager和Node角色,Manager通常部署在独立的机器,管理多个主从集群,定时探测主库;Node运行在各MySQL服务器,主库故障时,Manager会将数据最新的从库升为新主库,其他从库重新指向新主库,过程对应用透明

2.实战

2.1环境说明

设备名称 IP地址 操作系统版本/数据库版本 说明
mysql-master171 192.168.75.171/24 rhel7.9 数据库主服务器
mysql-master172 192.168.75.172/24 rhel7.9 数据库从服务器
mysql-master173 192.168.75.173/24 rhel7.9 数据库从服务器
mysql-mha-manager174 192.168.75.174/24 rhel7.9 manager控制器

2.2环境准备工作

2.2.1MySQL源码部署参考链接,里面有详细说明(manager控制器只需源码部署好即可,其他三台需要设置主从库的关系)

https://blog.csdn.net/2302_77662550/article/details/151225622

2.2.2初始化主库mysql-master171的配置

a.编辑mysql配置文件
[root@mysql-master171 ~]# vim /etc/my.cnf
[mysqld]
datadir=/data/mysql
socket=/data/mysql/mysql.sock
server_id=171   #集群中的各节点的id必须唯一
gtid-mode=on  #启用gtid类型
enforce-gtid-consistency=true  #强制GTID一致性
log-bin=binlog  #开启二进制日志
relay-log=relay-log  #开启中继日志
relay_log_purge=0  #是否自动清空不再需要的中继日志
log-slave-updates=true  #slave更新的信息是否记入二进制日志中

b.启动MySQL

[root@mysql-master171 ~]# /etc/init.d/mysqld start

2.2.3初始化mysql-master172和mysql-master173的配置

初始化msyql-master172的配置
a.编辑mysql配置文件
[root@mysql-master172 ~]# vim /etc/my.cnf
[mysqld]
datadir=/data/mysql
socket=/data/mysql/mysql.sock
server_id=172   #集群中的各节点的id必须唯一
gtid-mode=on  #启用gtid类型
enforce-gtid-consistency=true  #强制GTID一致性
log-bin=binlog  #开启二进制日志
relay-log=relay-log  #开启中继日志
relay_log_purge=0  #是否自动清空不再需要的中继日志
log-slave-updates=true  #slave更新的信息是否记入二进制日志中
b.启动MySQL
[root@mysql-master172 ~]# /etc/init.d/mysqld start
初始化mysql-master173的配置
a.编辑mysql配置文件
[root@mysql-master173 ~]# vim /etc/my.cnf
[mysqld]
datadir=/data/mysql
socket=/data/mysql/mysql.sock
server_id=173   #集群中的各节点的id必须唯一
gtid-mode=on  #启用gtid类型
enforce-gtid-consistency=true  #强制GTID一致性
log-bin=binlog  #开启二进制日志
relay-log=relay-log  #开启中继日志
relay_log_purge=0  #是否自动清空不再需要的中继日志
log-slave-updates=true  #slave更新的信息是否记入二进制日志中
b.启动MySQL
[root@mysql-master173 ~]# /etc/init.d/mysqld start

2.2.4配置一主多从复制架构

主库mysql-master171配置
a.创建一个名为rep的MySQL用户,允许任何主机登录,密码为123
mysql> create user 'rep'@'%' identified WITH mysql_native_password  by '123';
b.给MySQL用户rep一个主从复制的权限
mysql> grant  replication slave on *.* to 'rep'@'%';

注意:从库不需要执行这两条命令,因为从库同步主库的二进制日志(binlog),并执行日志种,而上面两个指令会并记录,所以从库会自动创建rep用户,为rep用户授予复制从库的权限

从库mysql-master172配置
a.配置从库到主库的连接信息和复制参数
mysql> CHANGE REPLICATION SOURCE TO
SOURCE_HOST='192.168.75.171',
SOURCE_USER='rep',
SOURCE_PASSWORD='123',
master_auto_position=1,
SOURCE_SSL=1; 
b.启动复制进程
mysql> start replica;
c.查看复制状态
mysql> show replica status \G
从库mysql-master173配置
a.配置从库到主库的连接信息和复制参数
mysql> CHANGE REPLICATION SOURCE TO
SOURCE_HOST='192.168.75.171',
SOURCE_USER='rep',
SOURCE_PASSWORD='123',
master_auto_position=1,
SOURCE_SSL=1; 
b.启动复制进程
mysql> start replica;
c.查看复制状态
mysql> show replica status \G

2.3配置MHA

2.3.1安装MHA包,下载地址

https://code.google.com/archive/p/mysql-master-ha/

mysql-mha-manager174上安装软件包(下载不下来通过我的资源包去下)

通过xftp上传

2.3.2安装解压软件

[root@mysql-mha-manager174 ~]# yum install unzip -y

2.3.3解压MHA软件

[root@mysql-mha-manager174 ~]# unzip MHA-7.zip

2.3.4安装本地已下载的RPM包

[root@mha-manager ~]# cd MHA-7/
[root@mha-manager MHA-7]# yum localinstall *.rpm

2.3.5所有主机要配置域名解析(一定要做!!!)

[root@mysql-mha-manager174 ~]# vim  /etc/hosts
127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4
::1         localhost localhost.localdomain localhost6 localhost6.localdomain6
192.168.75.171    mysql-master171
192.168.75.172    mysql-master172
192.168.75.173    mysql-master173
192.168.75.174    mysql-mha-manager174

2.3.6通过mysql-mha-manager174传mha4mysql-node到其他三台数据库服务器并安装

#需要输入各主机的密码及yes
[root@mysql-mha-manager174 MHA-7]# for i in {mysql-master171,mysql-master172,mysql-master173}; do scp mha4mysql-node-0.58-0.el7.centos.noarch.rpm root@$i:/root ; done

#另外三台数据库上安装mha4mysql-node
[root@master ~]# yum localinstall mha4mysql-node-0.58-0.el7.centos.noarch.rpm

2.3.7在mysql-master171上设置可以访问mysql的用户,并设置管理员权限

mysql> create user 'mhaadm'@'%' identified WITH mysql_native_password  by '123';
mysql> grant all on *.* to 'mhaadm'@'%';

注意:必须在主库上运行,会通过binlog同步到所有从库,确保MHA集群权限一致性

2.3.8所有主机设置远程控制(MHA集群中的主机都需要基于ssh相互通信,实现远程控制及数据管理功能)

第一种方法:

a.所有主机生成公私钥
[root@mysql-mha-manager174 ~]# ssh-keygen -f /root/.ssh/id_rsa -P '' -q
b.将所有主机的公钥上传到mysql-mha-manager主机
[root@mysql-master171 ~]# ssh-copy-id  root@mysql-mha-manager174
[root@mysql-master172 ~]# ssh-copy-id  root@mysql-mha-manager174
[root@mysql-master173 ~]# ssh-copy-id  root@mysql-mha-manager174
[root@mysql-mha-manager174 ~]# ssh-copy-id  root@mysql-mha-manager174
c.将manager主机的authorized_keys分发到其他主机
[root@mysql-mha-manager174 ~]# for i in {mysql-master171,mysql-master172,mysql-master173};do scp /root/.ssh/authorized_keys  root@$i:/root/.ssh;done
d.检查是否配置成功
[root@mysql-mha-manager174 ~]# for i in {mysql-master171,mysql-master172,mysql-master173};do ssh $i hostname;done

mysql-master171
mysql-master172
mysql-master173

第二种方法:

#批量传递管理节点公钥
[root@mysql-mha-manager174 ~]# for i in {mysql-master171,mysql-master172,mysql-master173}; do ssh-copy-id root@$i ; done
#批量传递私钥
[root@mysql-mha-manager174 ~]# for i in {mysql-master171,mysql-master172,mysql-master173}; do scp /root/.ssh/id_rsa root@$i:/root/.ssh; done
#批量验证免密连通性
[root@mysql-mha-manager174 ~]# for i in {mysql-master171,mysql-master172,mysql-master173};do ssh $i hostname ;done

2.3.9在mysql-mha-manager174上创建配置文件目录

#/etc/mha用于存放MHA的主配置文件
#/var/log/mha/app1是MHA的日志存储路径
[root@mysql-mha-manager174 ~]# mkdir   -p  /etc/mha  /var/log/mha/app1

2.3.10编辑MHA集群核心配置文件

[root@mysql-mha-manager174 ~]# vim  /etc/mha/app1.cnf
[server default]   #适用于server1,2,3个server的配置
user=mhaadm    #mha管理用户
password=123  #mha管理用户密码
manager_workdir=/var/log/mha/app1  #mha的工作路径
manager_log=/var/log/mha/app1/manager.log  #mha的日志文件
ssh_user=root  #基于ssh的秘钥认证
repl_user=rep  #主从复制的账号
repl_password=123
ping_interval=1  #ping间隔时长
[server1]   #mysql主机信息
hostname=192.168.75.171
ssh_port=22 
candidate_master=1  #设置该主机将来可能成为master候选节点
[server2]
hostname=192.168.75.172
ssh_port=22
candidate_master=1
[server3]
hostname=192.168.75.173
ssh_port=22
no_master=1
a.检测各主机间ssh相互通信配置是否ok
[root@mysql-mha-manager174 ~]# masterha_check_repl --conf=/etc/mha/app1.cnf
b.检测结果
[root@mha-manager ~]#  masterha_check_ssh --conf=/etc/mha/app.cnf
Sat Sep  6 01:56:49 2025 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Sat Sep  6 01:56:49 2025 - [info] Reading application default configuration from /etc/mha/app.cnf..
Sat Sep  6 01:56:49 2025 - [info] Reading server configuration from /etc/mha/app.cnf..
Sat Sep  6 01:56:49 2025 - [info] Starting SSH connection tests..
Sat Sep  6 01:56:49 2025 - [debug] 
Sat Sep  6 01:56:49 2025 - [debug]  Connecting via SSH from root@192.168.75.171(192.168.75.171:22) to root@192.168.75.172(192.168.75.172:22)..
Sat Sep  6 01:56:49 2025 - [debug]   ok.
Sat Sep  6 01:56:49 2025 - [debug]  Connecting via SSH from root@192.168.75.171(192.168.75.171:22) to root@192.168.75.173(192.168.75.173:22)..
Sat Sep  6 01:56:49 2025 - [debug]   ok.
Sat Sep  6 01:56:50 2025 - [debug] 
Sat Sep  6 01:56:49 2025 - [debug]  Connecting via SSH from root@192.168.75.172(192.168.75.172:22) to root@192.168.75.171(192.168.75.171:22)..
Sat Sep  6 01:56:49 2025 - [debug]   ok.
Sat Sep  6 01:56:49 2025 - [debug]  Connecting via SSH from root@192.168.75.172(192.168.75.172:22) to root@192.168.75.173(192.168.75.173:22)..
Sat Sep  6 01:56:50 2025 - [debug]   ok.
Sat Sep  6 01:56:51 2025 - [debug] 
Sat Sep  6 01:56:50 2025 - [debug]  Connecting via SSH from root@192.168.75.173(192.168.75.173:22) to root@192.168.75.171(192.168.75.171:22)..
Sat Sep  6 01:56:50 2025 - [debug]   ok.
Sat Sep  6 01:56:50 2025 - [debug]  Connecting via SSH from root@192.168.75.173(192.168.75.173:22) to root@192.168.75.172(192.168.75.172:22)..
Sat Sep  6 01:56:50 2025 - [debug]   ok.
Sat Sep  6 01:56:51 2025 - [info] All SSH connection tests passed successfully.
[root@mha-manager ~]# masterha_check_repl --conf=/etc/mha/app1.cnf
Sat Sep  6 01:57:04 2025 - [error][/usr/share/perl5/vendor_perl/MHA/MasterMonitor.pm, ln283] Configuration file /etc/mha/app1.cnf not found!
Sat Sep  6 01:57:04 2025 - [error][/usr/share/perl5/vendor_perl/MHA/MasterMonitor.pm, ln427] Error happened on checking configurations.  at /usr/bin/masterha_check_repl line 48.
Sat Sep  6 01:57:04 2025 - [error][/usr/share/perl5/vendor_perl/MHA/MasterMonitor.pm, ln525] Error happened on monitoring servers.
Sat Sep  6 01:57:04 2025 - [info] Got exit code 1 (Not master dead).

MySQL Replication Health is NOT OK!
c.检测mysql复制集群的连接配置参数是否ok
[root@mysql-mha-manager174 ~]# masterha_check_repl --conf=/etc/mha/app1.cnf
d.检测结果
[root@mysql-mha-manager174 ~]# masterha_check_repl --conf=/etc/mha/app.cnf
Sat Sep  6 01:57:12 2025 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Sat Sep  6 01:57:12 2025 - [info] Reading application default configuration from /etc/mha/app.cnf..
Sat Sep  6 01:57:12 2025 - [info] Reading server configuration from /etc/mha/app.cnf..
Sat Sep  6 01:57:12 2025 - [info] MHA::MasterMonitor version 0.58.
Sat Sep  6 01:57:13 2025 - [info] GTID failover mode = 1
Sat Sep  6 01:57:13 2025 - [info] Dead Servers:
Sat Sep  6 01:57:13 2025 - [info] Alive Servers:
Sat Sep  6 01:57:13 2025 - [info]   192.168.75.171(192.168.75.171:3306)
Sat Sep  6 01:57:13 2025 - [info]   192.168.75.172(192.168.75.172:3306)
Sat Sep  6 01:57:13 2025 - [info]   192.168.75.173(192.168.75.173:3306)
Sat Sep  6 01:57:13 2025 - [info] Alive Slaves:
Sat Sep  6 01:57:13 2025 - [info]   192.168.75.172(192.168.75.172:3306)  Version=8.0.40 (oldest major version between slaves) log-bin:enabled
Sat Sep  6 01:57:13 2025 - [info]     GTID ON
Sat Sep  6 01:57:13 2025 - [info]     Replicating from 192.168.75.171(192.168.75.171:3306)
Sat Sep  6 01:57:13 2025 - [info]     Primary candidate for the new Master (candidate_master is set)
Sat Sep  6 01:57:13 2025 - [info]   192.168.75.173(192.168.75.173:3306)  Version=8.0.40 (oldest major version between slaves) log-bin:enabled
Sat Sep  6 01:57:13 2025 - [info]     GTID ON
Sat Sep  6 01:57:13 2025 - [info]     Replicating from 192.168.75.171(192.168.75.171:3306)
Sat Sep  6 01:57:13 2025 - [info]     Not candidate for the new Master (no_master is set)
Sat Sep  6 01:57:13 2025 - [info] Current Alive Master: 192.168.75.171(192.168.75.171:3306)
Sat Sep  6 01:57:13 2025 - [info] Checking slave configurations..
Sat Sep  6 01:57:13 2025 - [info]  read_only=1 is not set on slave 192.168.75.172(192.168.75.172:3306).
Sat Sep  6 01:57:13 2025 - [info]  read_only=1 is not set on slave 192.168.75.173(192.168.75.173:3306).
Sat Sep  6 01:57:13 2025 - [info] Checking replication filtering settings..
Sat Sep  6 01:57:13 2025 - [info]  binlog_do_db= , binlog_ignore_db= 
Sat Sep  6 01:57:13 2025 - [info]  Replication filtering check ok.
Sat Sep  6 01:57:13 2025 - [info] GTID (with auto-pos) is supported. Skipping all SSH and Node package checking.
Sat Sep  6 01:57:13 2025 - [info] Checking SSH publickey authentication settings on the current master..
Sat Sep  6 01:57:13 2025 - [info] HealthCheck: SSH to 192.168.75.171 is reachable.
Sat Sep  6 01:57:13 2025 - [info] 
192.168.75.171(192.168.75.171:3306) (current master)
 +--192.168.75.172(192.168.75.172:3306)
 +--192.168.75.173(192.168.75.173:3306)

Sat Sep  6 01:57:13 2025 - [info] Checking replication health on 192.168.75.172..
Sat Sep  6 01:57:13 2025 - [info]  ok.
Sat Sep  6 01:57:13 2025 - [info] Checking replication health on 192.168.75.173..
Sat Sep  6 01:57:13 2025 - [info]  ok.
Sat Sep  6 01:57:13 2025 - [warning] master_ip_failover_script is not defined.
Sat Sep  6 01:57:13 2025 - [warning] shutdown_script is not defined.
Sat Sep  6 01:57:13 2025 - [info] Got exit code 0 (Not master dead).

MySQL Replication Health is OK.

3.MHA的故障切换

3.1概念

MHA故障切换主库故障时,Manager检测到后,自动将数据最新的从库升为新主库,同时其他从库指向新主库,并保证数据的一致性,MHA故障手动切换比较繁琐,MHA故障自动切换比较方便

3.2master故障手动切换

3.2.1模拟mysql-master171故障

[root@mysql-master171]# /etc/init.d/mysqld stop

3.2.2mysql-mha-manager中做故障切换(--ignore_last_failover 表示忽略在/var/log/mha/app1/目录中在切换过程生成的锁文件)

[root@mysql-mha-manager174 ~]# masterha_master_switch --master_state=dead --conf=/etc/mha/app1.cnf --dead_master_host=192.168.75.171 -- dead_master_port=3306 --new_master_host=192.168.75.172 --new_master_port=3306 --ignore_last_failover

#masterha_master_switch 用于手动/自动切换
#--master_state=dead   表示原主库状态已故障
#--conf=/etc/mha/app1.cnf 指定MHA集群配置文件路径,MHA从该文件读取节点信息,权限等配置
#--dead_master_host=192.168.75.171  指定已故障的原主库
#--dead_master_port=3306            原主库的mysql端口
#--new_master_host=192.168.75.172   指定要提升新主库的ip
#--new_master_port=3306             新主库的mysql端口
#--ignore_last_failover             忽略上次故障切换的冷却时间

测试结果(mysql-master172已经变成新的主库)

[root@mysql-mha-manager174 ~]# masterha_master_switch --master_state=dead --conf=/etc/mha/app.cnf --dead_master_host=192.168.75.171 -- dead_master_port=3306 --new_master_host=192.168.75.172 --new_master_port=3306 --ignore_last_failover
--dead_master_ip=<dead_master_ip> is not set. Using 192.168.75.171.
--dead_master_port=<dead_master_port> is not set. Using 3306.
Sat Sep  6 02:07:01 2025 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Sat Sep  6 02:07:01 2025 - [info] Reading application default configuration from /etc/mha/app.cnf..
Sat Sep  6 02:07:01 2025 - [info] Reading server configuration from /etc/mha/app.cnf..
Sat Sep  6 02:07:01 2025 - [info] MHA::MasterFailover version 0.58.
Sat Sep  6 02:07:01 2025 - [info] Starting master failover.
Sat Sep  6 02:07:01 2025 - [info] 
Sat Sep  6 02:07:01 2025 - [info] * Phase 1: Configuration Check Phase..
Sat Sep  6 02:07:01 2025 - [info] 
Sat Sep  6 02:07:02 2025 - [info] GTID failover mode = 1
Sat Sep  6 02:07:02 2025 - [info] Dead Servers:
Sat Sep  6 02:07:02 2025 - [info]   192.168.75.171(192.168.75.171:3306)
Sat Sep  6 02:07:02 2025 - [info] Checking master reachability via MySQL(double check)...
Sat Sep  6 02:07:02 2025 - [info]  ok.
Sat Sep  6 02:07:02 2025 - [info] Alive Servers:
Sat Sep  6 02:07:02 2025 - [info]   192.168.75.172(192.168.75.172:3306)
Sat Sep  6 02:07:02 2025 - [info]   192.168.75.173(192.168.75.173:3306)
Sat Sep  6 02:07:02 2025 - [info] Alive Slaves:
Sat Sep  6 02:07:02 2025 - [info]   192.168.75.172(192.168.75.172:3306)  Version=8.0.40 (oldest major version between slaves) log-bin:enabled
Sat Sep  6 02:07:02 2025 - [info]     GTID ON
Sat Sep  6 02:07:02 2025 - [info]     Replicating from 192.168.75.171(192.168.75.171:3306)
Sat Sep  6 02:07:02 2025 - [info]     Primary candidate for the new Master (candidate_master is set)
Sat Sep  6 02:07:02 2025 - [info]   192.168.75.173(192.168.75.173:3306)  Version=8.0.40 (oldest major version between slaves) log-bin:enabled
Sat Sep  6 02:07:02 2025 - [info]     GTID ON
Sat Sep  6 02:07:02 2025 - [info]     Replicating from 192.168.75.171(192.168.75.171:3306)
Sat Sep  6 02:07:02 2025 - [info]     Not candidate for the new Master (no_master is set)
Master 192.168.75.171(192.168.75.171:3306) is dead. Proceed? (yes/NO): yes
Sat Sep  6 02:07:12 2025 - [info] Starting GTID based failover.
Sat Sep  6 02:07:12 2025 - [info] 
Sat Sep  6 02:07:12 2025 - [info] ** Phase 1: Configuration Check Phase completed.
Sat Sep  6 02:07:12 2025 - [info] 
Sat Sep  6 02:07:12 2025 - [info] * Phase 2: Dead Master Shutdown Phase..
Sat Sep  6 02:07:12 2025 - [info] 
Sat Sep  6 02:07:12 2025 - [info] HealthCheck: SSH to 192.168.75.171 is reachable.
Sat Sep  6 02:07:12 2025 - [info] Forcing shutdown so that applications never connect to the current master..
Sat Sep  6 02:07:12 2025 - [warning] master_ip_failover_script is not set. Skipping invalidating dead master IP address.
Sat Sep  6 02:07:12 2025 - [warning] shutdown_script is not set. Skipping explicit shutting down of the dead master.
Sat Sep  6 02:07:12 2025 - [info] * Phase 2: Dead Master Shutdown Phase completed.
Sat Sep  6 02:07:12 2025 - [info] 
Sat Sep  6 02:07:12 2025 - [info] * Phase 3: Master Recovery Phase..
Sat Sep  6 02:07:12 2025 - [info] 
Sat Sep  6 02:07:12 2025 - [info] * Phase 3.1: Getting Latest Slaves Phase..
Sat Sep  6 02:07:12 2025 - [info] 
Sat Sep  6 02:07:12 2025 - [info] The latest binary log file/position on all slaves is binlog.000003:197
Sat Sep  6 02:07:12 2025 - [info] Retrieved Gtid Set: c109ec67-896c-11f0-90d5-000c294df9ee:1-7
Sat Sep  6 02:07:12 2025 - [info] Latest slaves (Slaves that received relay log files to the latest):
Sat Sep  6 02:07:12 2025 - [info]   192.168.75.172(192.168.75.172:3306)  Version=8.0.40 (oldest major version between slaves) log-bin:enabled
Sat Sep  6 02:07:12 2025 - [info]     GTID ON
Sat Sep  6 02:07:12 2025 - [info]     Replicating from 192.168.75.171(192.168.75.171:3306)
Sat Sep  6 02:07:12 2025 - [info]     Primary candidate for the new Master (candidate_master is set)
Sat Sep  6 02:07:12 2025 - [info]   192.168.75.173(192.168.75.173:3306)  Version=8.0.40 (oldest major version between slaves) log-bin:enabled
Sat Sep  6 02:07:12 2025 - [info]     GTID ON
Sat Sep  6 02:07:12 2025 - [info]     Replicating from 192.168.75.171(192.168.75.171:3306)
Sat Sep  6 02:07:12 2025 - [info]     Not candidate for the new Master (no_master is set)
Sat Sep  6 02:07:12 2025 - [info] The oldest binary log file/position on all slaves is binlog.000003:197
Sat Sep  6 02:07:12 2025 - [info] Retrieved Gtid Set: c109ec67-896c-11f0-90d5-000c294df9ee:1-7
Sat Sep  6 02:07:12 2025 - [info] Oldest slaves:
Sat Sep  6 02:07:12 2025 - [info]   192.168.75.172(192.168.75.172:3306)  Version=8.0.40 (oldest major version between slaves) log-bin:enabled
Sat Sep  6 02:07:12 2025 - [info]     GTID ON
Sat Sep  6 02:07:12 2025 - [info]     Replicating from 192.168.75.171(192.168.75.171:3306)
Sat Sep  6 02:07:12 2025 - [info]     Primary candidate for the new Master (candidate_master is set)
Sat Sep  6 02:07:12 2025 - [info]   192.168.75.173(192.168.75.173:3306)  Version=8.0.40 (oldest major version between slaves) log-bin:enabled
Sat Sep  6 02:07:12 2025 - [info]     GTID ON
Sat Sep  6 02:07:12 2025 - [info]     Replicating from 192.168.75.171(192.168.75.171:3306)
Sat Sep  6 02:07:12 2025 - [info]     Not candidate for the new Master (no_master is set)
Sat Sep  6 02:07:12 2025 - [info] 
Sat Sep  6 02:07:12 2025 - [info] * Phase 3.3: Determining New Master Phase..
Sat Sep  6 02:07:12 2025 - [info] 
Sat Sep  6 02:07:12 2025 - [info] Searching new master from slaves..
Sat Sep  6 02:07:12 2025 - [info]  Candidate masters from the configuration file:
Sat Sep  6 02:07:12 2025 - [info]   192.168.75.172(192.168.75.172:3306)  Version=8.0.40 (oldest major version between slaves) log-bin:enabled
Sat Sep  6 02:07:12 2025 - [info]     GTID ON
Sat Sep  6 02:07:12 2025 - [info]     Replicating from 192.168.75.171(192.168.75.171:3306)
Sat Sep  6 02:07:12 2025 - [info]     Primary candidate for the new Master (candidate_master is set)
Sat Sep  6 02:07:12 2025 - [info]  Non-candidate masters:
Sat Sep  6 02:07:12 2025 - [info]   192.168.75.173(192.168.75.173:3306)  Version=8.0.40 (oldest major version between slaves) log-bin:enabled
Sat Sep  6 02:07:12 2025 - [info]     GTID ON
Sat Sep  6 02:07:12 2025 - [info]     Replicating from 192.168.75.171(192.168.75.171:3306)
Sat Sep  6 02:07:12 2025 - [info]     Not candidate for the new Master (no_master is set)
Sat Sep  6 02:07:12 2025 - [info]  Searching from candidate_master slaves which have received the latest relay log events..
Sat Sep  6 02:07:12 2025 - [info] New master is 192.168.75.172(192.168.75.172:3306)
Sat Sep  6 02:07:12 2025 - [info] Starting master failover..
Sat Sep  6 02:07:12 2025 - [info] 
From:
192.168.75.171(192.168.75.171:3306) (current master)
 +--192.168.75.172(192.168.75.172:3306)
 +--192.168.75.173(192.168.75.173:3306)

To:
192.168.75.172(192.168.75.172:3306) (new master)
 +--192.168.75.173(192.168.75.173:3306)

Starting master switch from 192.168.75.171(192.168.75.171:3306) to 192.168.75.172(192.168.75.172:3306)? (yes/NO): yes
Sat Sep  6 02:07:14 2025 - [info] New master decided manually is 192.168.75.172(192.168.75.172:3306)
Sat Sep  6 02:07:14 2025 - [info] 
Sat Sep  6 02:07:14 2025 - [info] * Phase 3.3: New Master Recovery Phase..
Sat Sep  6 02:07:14 2025 - [info] 
Sat Sep  6 02:07:14 2025 - [info]  Waiting all logs to be applied.. 
Sat Sep  6 02:07:14 2025 - [info]   done.
Sat Sep  6 02:07:14 2025 - [info] Getting new master's binlog name and position..
Sat Sep  6 02:07:14 2025 - [info]  binlog.000003:1141
Sat Sep  6 02:07:14 2025 - [info]  All other slaves should start replication from here. Statement should be: CHANGE MASTER TO MASTER_HOST='192.168.75.172', MASTER_PORT=3306, MASTER_AUTO_POSITION=1, MASTER_USER='rep', MASTER_PASSWORD='xxx';
Sat Sep  6 02:07:14 2025 - [info] Master Recovery succeeded. File:Pos:Exec_Gtid_Set: binlog.000003, 1141, 3ea421e4-896e-11f0-988a-000c29c5caee:1,
c109ec67-896c-11f0-90d5-000c294df9ee:1-7
Sat Sep  6 02:07:14 2025 - [warning] master_ip_failover_script is not set. Skipping taking over new master IP address.
Sat Sep  6 02:07:14 2025 - [info] ** Finished master recovery successfully.
Sat Sep  6 02:07:14 2025 - [info] * Phase 3: Master Recovery Phase completed.
Sat Sep  6 02:07:14 2025 - [info] 
Sat Sep  6 02:07:14 2025 - [info] * Phase 4: Slaves Recovery Phase..
Sat Sep  6 02:07:14 2025 - [info] 
Sat Sep  6 02:07:14 2025 - [info] 
Sat Sep  6 02:07:14 2025 - [info] * Phase 4.1: Starting Slaves in parallel..
Sat Sep  6 02:07:14 2025 - [info] 
Sat Sep  6 02:07:14 2025 - [info] -- Slave recovery on host 192.168.75.173(192.168.75.173:3306) started, pid: 2914. Check tmp log /var/log/mha/app1/192.168.75.173_3306_20250906020701.log if it takes time..
Sat Sep  6 02:07:16 2025 - [info] 
Sat Sep  6 02:07:16 2025 - [info] Log messages from 192.168.75.173 ...
Sat Sep  6 02:07:16 2025 - [info] 
Sat Sep  6 02:07:14 2025 - [info]  Resetting slave 192.168.75.173(192.168.75.173:3306) and starting replication from the new master 192.168.75.172(192.168.75.172:3306)..
Sat Sep  6 02:07:14 2025 - [info]  Executed CHANGE MASTER.
Sat Sep  6 02:07:15 2025 - [info]  Slave started.
Sat Sep  6 02:07:15 2025 - [info]  gtid_wait(3ea421e4-896e-11f0-988a-000c29c5caee:1,
c109ec67-896c-11f0-90d5-000c294df9ee:1-7) completed on 192.168.75.173(192.168.75.173:3306). Executed 0 events.
Sat Sep  6 02:07:16 2025 - [info] End of log messages from 192.168.75.173.
Sat Sep  6 02:07:16 2025 - [info] -- Slave on host 192.168.75.173(192.168.75.173:3306) started.
Sat Sep  6 02:07:16 2025 - [info] All new slave servers recovered successfully.
Sat Sep  6 02:07:16 2025 - [info] 
Sat Sep  6 02:07:16 2025 - [info] * Phase 5: New master cleanup phase..
Sat Sep  6 02:07:16 2025 - [info] 
Sat Sep  6 02:07:16 2025 - [info] Resetting slave info on the new master..
Sat Sep  6 02:07:16 2025 - [info]  192.168.75.172: Resetting slave info succeeded.
Sat Sep  6 02:07:16 2025 - [info] Master failover to 192.168.75.172(192.168.75.172:3306) completed successfully.
Sat Sep  6 02:07:16 2025 - [info] 

----- Failover Report -----

app: MySQL Master failover 192.168.75.171(192.168.75.171:3306) to 192.168.75.172(192.168.75.172:3306) succeeded

Master 192.168.75.171(192.168.75.171:3306) is down!

Check MHA Manager logs at mysql-mha-manager174 for details.

Started manual(interactive) failover.
Selected 192.168.75.172(192.168.75.172:3306) as a new master.
192.168.75.172(192.168.75.172:3306): OK: Applying all logs succeeded.
192.168.75.173(192.168.75.173:3306): OK: Slave started, replicating from 192.168.75.172(192.168.75.172:3306)
192.168.75.172(192.168.75.172:3306): Resetting slave info succeeded.
Master failover to 192.168.75.172(192.168.75.172:3306) completed successfully.

3.2.3恢复故障mysql节点

a.将mysql-master171主机重新加入集群中
[root@mysql-master171 ~]# /etc/init.d/mysqld start
Starting MySQL. SUCCESS! 
[root@mysql-master171 ~]# mysql -uroot -p123
mysql> CHANGE REPLICATION SOURCE TO
    -> SOURCE_HOST='192.168.75.172',
    -> SOURCE_USER='rep',
    -> SOURCE_PASSWORD='123',
    -> SOURCE_AUTO_POSITION=1,
    -> SOURCE_SSL=1;
Query OK, 0 rows affected, 2 warnings (0.07 sec)
b.启动复制进程
mysql> start replica;
Query OK, 0 rows affected (0.01 sec)
c.查看复制状态
mysql> show replica status\G
*************************** 1. row ***************************
             Replica_IO_State: Waiting for source to send event
                  Source_Host: 192.168.75.172    #此时的主库为mysql-master172
                  Source_User: rep
                  Source_Port: 3306
                Connect_Retry: 60
              Source_Log_File: binlog.000004
          Read_Source_Log_Pos: 237
               Relay_Log_File: relay-log.000002
                Relay_Log_Pos: 367
        Relay_Source_Log_File: binlog.000004
           Replica_IO_Running: Yes
          Replica_SQL_Running: Yes

3.2.4启动mha,使mysql故障后自动切换

a.删除锁文件
#如果不删除的话mha无法故障转移成功
#使用--ignore_last_failover来忽略锁文件
[root@mysql-mha-manager174 ~]# rm -rf /var/log/mha/app1/app.failover.complete 
b.启动mha
[root@mha-manager174 ~]# masterha_manager --conf=/etc/mha/app1.cnf
c.查看mha的状态
[root@mysql-mha-manager174 ~]# masterha_check_status --conf=/etc/mha/app1.cnf
d.模拟主mysql-master172宕机
[root@mysql-master172 ~]# /etc/init.d/mysqld  stop

注意:故障转移完成后,mysql-mha-manager将会自动停止运行

e.查看转移日志
[root@mysql-mha-manager174 ~]# cat /var/log/mha/app1/manager.log
f.在mysql-master173查看现在的主库是谁(此时主库已经切换到mysql-master171)
mysql> show replica status \G
*************************** 1. row ***************************
             Replica_IO_State: Waiting for source to send event
                  Source_Host: 192.168.75.171
                  Source_User: rep
                  Source_Port: 3306
                Connect_Retry: 60
              Source_Log_File: binlog.000004
          Read_Source_Log_Pos: 524
               Relay_Log_File: relay-log.000002
                Relay_Log_Pos: 411
        Relay_Source_Log_File: binlog.000004
           Replica_IO_Running: Yes
          Replica_SQL_Running: Yes
g.删除锁文件
[root@mysql-mha-manager174 ~]# rm -rf  /var/log/mha/app1/app1.failover.complete
h.恢复故障的mysql-master172
启动mysql-master172
[root@mysql-master172 ~]# /etc/init.d/mysqld start
Starting MySQL. SUCCESS! 
设置从库
[root@mysql-master172 ~]# mysql -uroot -p123
mysql> CHANGE REPLICATION SOURCE TO
    -> SOURCE_HOST='192.168.75.171',
    -> SOURCE_USER='rep',
    -> SOURCE_PASSWORD='123',
    -> SOURCE_AUTO_POSITION=1,
    -> SOURCE_SSL=1;
Query OK, 0 rows affected, 2 warnings (0.06 sec)
启动复制线程
mysql> start replica;
Query OK, 0 rows affected (0.02 sec)

4.配置VIP

4.1概念

vip配置有两种方式:keepalived管理虚拟IP浮动,或者脚本启动虚拟的IP,为了防止脑裂,生产环境推荐脚本的方式,不需要keepalived等软件,更加可靠

4.2配置文件

4.2.1编写脚本

a.脚本放到/usr/local/bin/底下

[root@mysql-mha-manager174 ~]# vim /usr/local/bin/master_ip_failover
#!/usr/bin/env perl
use strict;
use warnings FATAL => 'all';
use Getopt::Long;
my (
    $command, $ssh_user, $orig_master_host, $orig_master_ip,
    $orig_master_port, $new_master_host, $new_master_ip, $new_master_port
);
#注意此处配置的ip地址和网卡名称
my $vip = '192.168.75.88/24';
my $ssh_start_vip = "/sbin/ip a add $vip dev eth0";
my $ssh_stop_vip = "/sbin/ip a del $vip dev eth0";

GetOptions(
  'command=s' => \$command,
  'ssh_user=s' => \$ssh_user,
  'orig_master_host=s' => \$orig_master_host,
  'orig_master_ip=s' => \$orig_master_ip,
  'orig_master_port=i' => \$orig_master_port,
  'new_master_host=s' => \$new_master_host,
  'new_master_ip=s' => \$new_master_ip,
  'new_master_port=i' => \$new_master_port,
);

exit &main();

sub main {
  print "\n\nIN SCRIPT TEST====$ssh_stop_vip==$ssh_start_vip===\n\n";
  if ( $command eq "stop" || $command eq "stopssh" ) {
    my $exit_code = 1;
    eval {
      print "Disabling the VIP on old master: $orig_master_host \n";
      &stop_vip();
      $exit_code = 0;
    };
    if ($@) {
        warn "Got Error: $@\n";
        exit $exit_code;
    }
    exit $exit_code;
    }
  elsif ( $command eq "start" ) {
    my $exit_code = 10;
    eval {
      print "Enabling the VIP - $vip on the new master - $new_master_host
\n";
      &start_vip();
      $exit_code = 0;
    };
    if ($@) {
      warn $@;
      exit $exit_code;
    }
    exit $exit_code;
  }
  elsif ( $command eq "status" ) {
    print "Checking the Status of the script.. OK \n";
    exit 0;
  }
  else {
    &usage();
    exit 1;
  }
}
sub start_vip() {
  `ssh $ssh_user\@$new_master_host \" $ssh_start_vip \"`;
}
sub stop_vip() {
  return 0 unless ($ssh_user);
  `ssh $ssh_user\@$orig_master_host \" $ssh_stop_vip \"`;
}
sub usage {
  print "Usage: master_ip_failover --command=start|stop|stopssh|status --orig_master_host=host --orig_master_ip=ip --orig_master_port=port --new_master_host=host --new_master_ip=ip --new_master_port=port\n";
}

b.设置执行权限

[root@mysql-mha-manager174 ~]# chmod +x /usr/local/bin/master_ip_failover

4.2.2更改mysql-mha-master174配置文件

#在server default下面添加如下信息
[root@mha-manager ~]# vim /etc/mha/app1.cnf
[server default]
master_ip_failover_script=/usr/local/bin/master_ip_failover   #用于指定主库故障切换时管理虚拟ip的脚本路径

4.2.3在mysql-master171上手动配置第一次VIP地址

[root@mysql-master171]# ip a a 192.168.75.88/24 dev eth0
[root@mysql-master171]# ip a show eth0
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
    link/ether 00:0c:29:d6:ad:c7 brd ff:ff:ff:ff:ff:ff
    inet 192.168.75.171/24 brd 192.168.75.255 scope global noprefixroute eth0
       valid_lft forever preferred_lft forever
    inet 192.168.75.88/24 scope global secondary eth0

4.2.4在mysql-mha-master174上启动MHA(启动了这个需要在打开一个终端)

[root@mysql-mha-manager174 ~]# masterha_manager --conf=/etc/mha/app1.cnf --ignore_last_failover

4.2.5在mysql-master171关闭mysql服务模拟主库数据崩溃

[root@mysql-master171]# /etc/init.d/mysqld  stop

4.2.6测试

a.在mysql-master172查看VIP
[root@mysql-master172 ~]# ip a show eth0
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
    link/ether 00:0c:29:c5:ca:ee brd ff:ff:ff:ff:ff:ff
    inet 192.168.75.172/24 brd 192.168.75.255 scope global noprefixroute eth0
       valid_lft forever preferred_lft forever
    inet 192.168.75.88/24 scope global secondary eth0
       valid_lft forever preferred_lft forever
    inet6 fe80::75c2:222:6e68:3095/64 scope link tentative noprefixroute dadfailed 
       valid_lft forever preferred_lft forever
    inet6 fe80::269a:d353:bdad:ccfc/64 scope link noprefixroute 
       valid_lft forever preferred_lft forever


网站公告

今日签到

点亮在社区的每一天
去签到