文章目录
由于keepalived本身机制问题 无法实现容器层的单机部署两个keepalived及两个mysql主从复制,故障后自动漂移至另一个可用的节点的效果,原因在末尾
数据库主从复制配置部分依然可用
准备
创建网络
docker network create \
--subnet=192.168.10.0/24 \
--gateway=192.168.10.1 \
--driver=bridge docker-mysql-ha
启动脚本
services:
mysql-master:
container_name: mysql-master
hostname: mysql-master
image: mysql:8.0
networks:
docker-mysql-ha:
ipv4_address: 192.168.10.11
environment:
TZ: Asia/Shanghai
MYSQL_ROOT_PASSWORD: 20240510
MYSQL_USER: kd1
MYSQL_PASSWORD: 20240511
volumes:
- /etc/localtime:/etc/localtime:ro
- /data/dockerfiles/mysql-master/mysql_data:/var/lib/mysql
- /data/dockerfiles/mysql-master/conf/my.cnf:/etc/my.cnf
- /data/dockerfiles/mysql-master/mysqld:/var/run/mysqld
- /data/dockerfiles/mysql-master/log:/var/log/mysql
restart: unless-stopped
mysql-slave:
container_name: mysql-slave
hostname: mysql-slave
image: mysql:8.0
networks:
docker-mysql-ha:
ipv4_address: 192.168.10.12
environment:
TZ: Asia/Shanghai
MYSQL_ROOT_PASSWORD: 20240510
MYSQL_USER: kd1
MYSQL_PASSWORD: 20240511
volumes:
- /etc/localtime:/etc/localtime:ro
- /data/dockerfiles/mysql-slave/mysql_data:/var/lib/mysql
- /data/dockerfiles/mysql-slave/conf/my.cnf:/etc/my.cnf
- /data/dockerfiles/mysql-slave/mysqld:/var/run/mysqld
- /data/dockerfiles/mysql-slave/log:/var/log/mysql
restart: unless-stopped
networks:
docker-mysql-ha:
external: true
创建持久化存储目录
mkdir -p /data/dockerfiles/mysql-master/conf/
mkdir -p /data/dockerfiles/mysql-slave/conf/
放入配置文件
主节点配置
my.cnf
# For advice on how to change settings please see
# http://dev.mysql.com/doc/refman/8.0/en/server-configuration-defaults.html
[mysqld]
#
# Remove leading # and set to the amount of RAM for the most important data
# cache in MySQL. Start at 70% of total RAM for dedicated server, else 10%.
# innodb_buffer_pool_size = 128M
#
# Remove leading # to turn on a very important data integrity option: logging
# changes to the binary log between backups.
# log_bin
#
# Remove leading # to set options mainly useful for reporting servers.
# The server defaults are faster for transactions and fast SELECTs.
# Adjust sizes as needed, experiment to find the optimal values.
# join_buffer_size = 128M
# sort_buffer_size = 2M
# read_rnd_buffer_size = 2M
# Remove leading # to revert to previous value for default_authentication_plugin,
# this will increase compatibility with older clients. For background, see:
# https://dev.mysql.com/doc/refman/8.0/en/server-system-variables.html#sysvar_default_authentication_plugin
# default-authentication-plugin=mysql_native_password
# 基本路径设置
datadir=/var/lib/mysql
socket=/var/run/mysqld/mysqld.sock
user=mysql
pid-file=/var/run/mysqld/mysqld.pid
secure-file-priv=/var/lib/mysql-files
# 网络与连接
skip-host-cache
skip-name-resolve
max_connections=200
wait_timeout=28800
interactive_timeout=28800
# 字符集与排序规则
character-set-server=utf8mb4
collation-server=utf8mb4_unicode_ci
init_connect='SET NAMES utf8mb4'
# 设置服务器时区(建议与宿主机一致)
default-time-zone='+08:00'
# 二进制日志(推荐开启,用于数据恢复/主从复制)
server-id=1
binlog_format=ROW
log_bin=/var/log/mysql/mysql-bin.log
binlog_expire_logs_seconds=604800 # 保留7天
sync_binlog=1
# 慢查询日志(调优关键)
slow_query_log=1
slow_query_log_file=/var/log/mysql/mysql-slow.log
long_query_time=1
log_queries_not_using_indexes=1
# 错误日志
# log_error=/var/log/mysql/mysql-error.log
# InnoDB 引擎优化
innodb_buffer_pool_size=512M
innodb_redo_log_capacity=256M
innodb_file_per_table=1
innodb_flush_log_at_trx_commit=1
# 缓存优化
table_open_cache=400
tmp_table_size=64M
max_heap_table_size=64M
sort_buffer_size=4M
read_buffer_size=2M
join_buffer_size=4M
[client]
socket=/var/run/mysqld/mysqld.sock
default-character-set=utf8mb4
[mysqldump]
default-character-set=utf8mb4
quick
max_allowed_packet=64M
!includedir /etc/mysql/conf.d/
从节点配置
my.cnf
# For advice on how to change settings please see
# http://dev.mysql.com/doc/refman/8.0/en/server-configuration-defaults.html
[mysqld]
#
# Remove leading # and set to the amount of RAM for the most important data
# cache in MySQL. Start at 70% of total RAM for dedicated server, else 10%.
# innodb_buffer_pool_size = 128M
#
# Remove leading # to turn on a very important data integrity option: logging
# changes to the binary log between backups.
# log_bin
#
# Remove leading # to set options mainly useful for reporting servers.
# The server defaults are faster for transactions and fast SELECTs.
# Adjust sizes as needed, experiment to find the optimal values.
# join_buffer_size = 128M
# sort_buffer_size = 2M
# read_rnd_buffer_size = 2M
# Remove leading # to revert to previous value for default_authentication_plugin,
# this will increase compatibility with older clients. For background, see:
# https://dev.mysql.com/doc/refman/8.0/en/server-system-variables.html#sysvar_default_authentication_plugin
# default-authentication-plugin=mysql_native_password
# 基本路径设置
datadir=/var/lib/mysql
socket=/var/run/mysqld/mysqld.sock
user=mysql
pid-file=/var/run/mysqld/mysqld.pid
secure-file-priv=/var/lib/mysql-files
# 网络与连接
skip-host-cache
skip-name-resolve
max_connections=200
wait_timeout=28800
interactive_timeout=28800
# 字符集与排序规则
character-set-server=utf8mb4
collation-server=utf8mb4_unicode_ci
init_connect='SET NAMES utf8mb4'
# 设置服务器时区(建议与宿主机一致)
default-time-zone='+08:00'
# 二进制日志(推荐开启,用于数据恢复/主从复制)
server-id=2
relay-log=/var/lib/mysql/relay-log
binlog_format=ROW
log_bin=/var/log/mysql/mysql-bin.log
binlog_expire_logs_seconds=604800 # 保留7天
sync_binlog=1
# 慢查询日志(调优关键)
slow_query_log=1
slow_query_log_file=/var/log/mysql/mysql-slow.log
long_query_time=1
log_queries_not_using_indexes=1
# 错误日志
# log_error=/var/log/mysql/mysql-error.log
# InnoDB 引擎优化
innodb_buffer_pool_size=512M
innodb_redo_log_capacity=256M
innodb_file_per_table=1
innodb_flush_log_at_trx_commit=1
# 缓存优化
table_open_cache=400
tmp_table_size=64M
max_heap_table_size=64M
sort_buffer_size=4M
read_buffer_size=2M
join_buffer_size=4M
[client]
socket=/var/run/mysqld/mysqld.sock
default-character-set=utf8mb4
[mysqldump]
default-character-set=utf8mb4
quick
max_allowed_packet=64M
!includedir /etc/mysql/conf.d/
启动容器
docker compose -f /data/dockercompose/docker-compose-mysql-ha.yml up -d
Mysql主从复制配置
主库配置
docker exec -it mysql-master mysql -uroot -p20240510
# 创建一个用户名叫 replicator 的新用户。 '%' 表示允许从任意 IP 地址连接。 账号的密码是 replica123。
mysql> CREATE USER 'replicator'@'%' IDENTIFIED BY 'replica123';
# 给 replicator 用户授予 REPLICATION SLAVE 权限。 这个权限允许该用户作为从库连接主库,进行二进制日志的复制。 *.* 表示数据库和表全部适用。
mysql> GRANT REPLICATION SLAVE ON *.* TO 'replicator'@'%';
# 刷新权限表:
mysql> FLUSH PRIVILEGES;
# 查看当前 binlog 状态 记住 File 和 Position,后面从库要用。
mysql> SHOW MASTER STATUS;
+------------------+----------+--------------+------------------+-------------------+
| File | Position | Binlog_Do_DB | Binlog_Ignore_DB | Executed_Gtid_Set |
+------------------+----------+--------------+------------------+-------------------+
| mysql-bin.000005 | 868 | | | |
+------------------+----------+--------------+------------------+-------------------+
1 row in set (0.00 sec)
从库配置
docker exec -it mysql-slave mysql -uroot -p20240510
# 从库中设置主库连接信息
mysql> CHANGE MASTER TO
MASTER_HOST='192.168.10.11',
MASTER_PORT=3306,
MASTER_USER='replicator',
MASTER_PASSWORD='replica123',
MASTER_LOG_FILE='mysql-bin.000005', -- 此处替换为主库 SHOW MASTER STATUS 的结果
MASTER_LOG_POS=868, -- 同上
GET_MASTER_PUBLIC_KEY = 1;
# 启动复制
mysql> START SLAVE;
# 查看复制状态
mysql> SHOW SLAVE STATUS\G
确认以下两项均为 Yes:
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
示例输出:
mysql> SHOW SLAVE STATUS\G
*************************** 1. row ***************************
Slave_IO_State: Waiting for source to send event
Master_Host: 192.168.10.11
Master_User: replicator
Master_Port: 3306
Connect_Retry: 60
Master_Log_File: mysql-bin.000005
Read_Master_Log_Pos: 868
Relay_Log_File: relay-log.000002
Relay_Log_Pos: 326
Relay_Master_Log_File: mysql-bin.000005
Slave_IO_Running: Yes -- 此处应为 Yes
Slave_SQL_Running: Yes -- 此处应为 Yes
Replicate_Do_DB:
Replicate_Ignore_DB:
Replicate_Do_Table:
Replicate_Ignore_Table:
Replicate_Wild_Do_Table:
Replicate_Wild_Ignore_Table:
Last_Errno: 0
Last_Error:
Skip_Counter: 0
Exec_Master_Log_Pos: 868
Relay_Log_Space: 530
Until_Condition: None
Until_Log_File:
Until_Log_Pos: 0
Master_SSL_Allowed: No
Master_SSL_CA_File:
Master_SSL_CA_Path:
Master_SSL_Cert:
Master_SSL_Cipher:
Master_SSL_Key:
Seconds_Behind_Master: 0
Master_SSL_Verify_Server_Cert: No
Last_IO_Errno: 0
Last_IO_Error:
Last_SQL_Errno: 0
Last_SQL_Error:
Replicate_Ignore_Server_Ids:
Master_Server_Id: 1
Master_UUID: b8b661ab-6d15-11f0-be67-0242c0a80a0b
Master_Info_File: mysql.slave_master_info
SQL_Delay: 0
SQL_Remaining_Delay: NULL
Slave_SQL_Running_State: Replica has read all relay log; waiting for more updates
Master_Retry_Count: 86400
Master_Bind:
Last_IO_Error_Timestamp:
Last_SQL_Error_Timestamp:
Master_SSL_Crl:
Master_SSL_Crlpath:
Retrieved_Gtid_Set:
Executed_Gtid_Set:
Auto_Position: 0
Replicate_Rewrite_DB:
Channel_Name:
Master_TLS_Version:
Master_public_key_path:
Get_master_public_key: 1
Network_Namespace:
1 row in set, 1 warning (0.00 sec)
验证同步
- 主库操作
CREATE DATABASE test_sync;
USE test_sync;
CREATE TABLE users (id INT PRIMARY KEY, name VARCHAR(100));
INSERT INTO users VALUES (1, 'Alice');
- 从库验证
SELECT * FROM test_sync.users;
# 示例输出
+----+-------+
| id | name |
+----+-------+
| 1 | Alice |
+----+-------+
1 row in set (0.00 sec)
更改主从配置
- 主库
docker exec -it mysql-master mysql -uroot -p20240510
-- 确保之前创建用于主从配置的用户存在
mysql> SELECT user, host FROM mysql.user WHERE user = 'replicator';
+------------+------+
| user | host |
+------------+------+
| replicator | % |
+------------+------+
1 row in set (0.00 sec)
-- 再次查看当前 binlog 状态 记住 File 和 Position,后面从库要用。
mysql> SHOW MASTER STATUS;
+------------------+----------+--------------+------------------+-------------------+
| File | Position | Binlog_Do_DB | Binlog_Ignore_DB | Executed_Gtid_Set |
+------------------+----------+--------------+------------------+-------------------+
| mysql-bin.000011 | 756 | | | |
+------------------+----------+--------------+------------------+-------------------+
1 row in set (0.00 sec)
- 从库
docker exec -it mysql-slave mysql -uroot -p20240510
-- 停止之前配置的复制的 IO 线程
STOP REPLICA IO_THREAD;
-- 从库中设置主库连接信息
CHANGE MASTER TO
MASTER_HOST='192.168.0.8',
MASTER_PORT=3306,
MASTER_USER='replicator',
MASTER_PASSWORD='replica123',
MASTER_LOG_FILE='mysql-bin.000011',
MASTER_LOG_POS=456,
GET_MASTER_PUBLIC_KEY = 1;
-- 启动复制
mysql> START SLAVE;
-- 查看复制状态
mysql> SHOW SLAVE STATUS\G
确认以下两项均为 Yes:
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
验证同步
- 主库操作
USE test_sync;
INSERT INTO users VALUES (5, 'Slice');
- 从库验证
SELECT * FROM test_sync.users;
# 示例输出
+----+-------+
| id | name |
+----+-------+
| 1 | Alice |
| 5 | Slice |
+----+-------+
2 rows in set (0.00 sec)
取消从库主从复制
-- 停止复制线程
STOP SLAVE;
-- 重置复制配置(清除复制相关信息)
RESET SLAVE ALL;
-- 检查当前是否启用了 read_only; ON 表示开启; OFF 表示关闭
SHOW VARIABLES LIKE 'read_only';
-- 可选)如果你想让从库完全变成普通库,且保证可写,执行:
SET GLOBAL read_only = OFF;
-- (可选)检查复制状态,确认已经取消; 如果显示为空或报错,说明复制已取消。
SHOW SHOW SLAVE STATUS\G
安装 Keepalived 容器(两个)
- 需要部署两个 Keepalived 容器:分别绑定到 mysql-master 和 mysql-slave 上。
Keepalived 容器共享主机的网络命名空间(host 模式),以便绑定 VIP:
# 创建配置目录
mkdir -p /data/dockerfiles/keepalived/master
mkdir -p /data/dockerfiles/keepalived/slave
编写 Keepalived 配置
注意:interface 应该写 docker-mysql-ha 也就是192.168.10.x 网络对应的网卡名
- master 端:
文件:/data/dockerfiles/keepalived/master/keepalived.conf
vrrp_script check_mysql {
script "/usr/local/etc/keepalived/check_mysql.sh 192.168.10.11"
interval 2
weight -30
}
vrrp_instance VI_1 {
state MASTER
interface eth0
virtual_router_id 51
priority 150
advert_int 1
authentication {
auth_type PASS
auth_pass 1234
}
virtual_ipaddress {
172.21.147.250
}
track_script {
check_mysql
}
}
- 脚本:/data/dockerfiles/keepalived/master/check_mysql.sh
#!/bin/bash
CHECK_HOST=$1
CHECK_PORT=3306
timeout 1 bash -c "echo > /dev/tcp/${CHECK_HOST}/${CHECK_PORT}"
if [ $? -eq 0 ]; then
exit 0
else
exit 1
fi
- slave 端:
文件:/data/dockerfiles/keepalived/slave/keepalived.conf
vrrp_script check_mysql {
script "/usr/local/etc/keepalived/check_mysql.sh 192.168.10.12"
interval 2
weight -30
}
vrrp_instance VI_1 {
state BACKUP
interface eth0
virtual_router_id 51
priority 100
advert_int 1
authentication {
auth_type PASS
auth_pass 1234
}
virtual_ipaddress {
172.21.147.250
}
track_script {
check_mysql
}
}
- 脚本:/data/dockerfiles/keepalived/slave/check_mysql.sh
#!/bin/bash
CHECK_HOST=$1
CHECK_PORT=3306
timeout 1 bash -c "echo > /dev/tcp/${CHECK_HOST}/${CHECK_PORT}"
if [ $? -eq 0 ]; then
exit 0
else
exit 1
fi
- 别忘了给脚本加执行权限:
chmod +x /data/dockerfiles/keepalived/*/check_mysql.sh
启动脚本
确保容器使用 host 网络(否则无法绑定 VIP):
version: "3.8"
services:
keepalived-master:
image: osixia/keepalived:2.0.20
container_name: keepalived-master
network_mode: host
cap_add:
- NET_ADMIN
- NET_RAW
- NET_BROADCAST
volumes:
- /data/dockerfiles/keepalived/master/keepalived.conf:/usr/local/etc/keepalived/keepalived.conf:ro
- /data/dockerfiles/keepalived/master/check_mysql.sh:/usr/local/etc/keepalived/check_mysql.sh:ro
restart: always
keepalived-slave:
image: osixia/keepalived:2.0.20
container_name: keepalived-slave
network_mode: host
cap_add:
- NET_ADMIN
- NET_RAW
- NET_BROADCAST
volumes:
- /data/dockerfiles/keepalived/slave/keepalived.conf:/usr/local/etc/keepalived/keepalived.conf:ro
- /data/dockerfiles/keepalived/slave/check_mysql.sh:/usr/local/etc/keepalived/check_mysql.sh:ro
restart: always
原因
一、致命架构缺陷
VIP与宿主机IP跨子网问题
- VIP(192.168.1.100/24)与宿主机IP(192.168.10.25/24)处于不同子网
- 二层网络无法直接通信,ARP广播请求无法到达
- 违反网络基础原理:同一物理接口不能承载跨子网IP
路由黑洞问题
- 192.168.1.0/24网段在路由器无路由条目
- 返回数据包因非对称路由被丢弃
ARP代理失效
- 宿主机无法响应192.168.1.0/24网段的ARP请求
- 客户端持续发送ARP请求无响应
二、技术方案矛盾点
方案要素 | 矛盾点 | 后果 |
---|---|---|
单机VIP漂移 | 单节点无需VIP漂移 | 增加无效复杂度 |
端口重定向(DNAT) | 与VIP功能重叠 | 流量路径混乱 |
Keepalived监控 | 单点监控无意义 | 无法实现真正HA |
跨子网VIP | 违反RFC网络标准 | 连通性不可达 |
三、网络通信原理冲突
OSI模型违反
- 三层IP(192.168.1.100)绑定在二层接口(eth0)
- 但eth0已绑定192.168.10.0/24子网
- 违反"一个物理接口不同时承载多个逻辑子网"原则
ARP协议限制
# 客户端ARP请求
ARP who-has 192.168.1.100 tell 192.168.10.24
# 宿主机响应
# 因IP不属于接口子网,内核自动丢弃该请求!