title: debian12安装Mariadb Galera Cluster+Nginx+Keepalived高可用多主集群
author: Loong
date: 2024-06-13
准备工作
主机 | IP | 说明 |
---|---|---|
galera-node1 | 192.168.100.111 | MariaDB Galera集群节点1 |
galera-node2 | 192.168.100.112 | MariaDB Galera集群节点2 |
galera-node3 | 192.168.100.113 | MariaDB Galera集群节点3 |
galera-nginx1 | 192.168.100.114 | Nginx+keepalived 主节点 VIP 192.168.100.110 |
galera-nginx2 | 192.168.100.115 | Nginx+keepalived 备份节点 |
需要5台虚拟机和6个ip
系统安装、主机名配置、网络配置等略
一、Mariadb Galera Cluster
参考:
mariadb官网 安装galera集群:Getting Started with MariaDB Galera Cluster - MariaDB Knowledge Base
Debian 12 Bookworm : MariaDB : 安装 : Server World (server-world.info)
Debian 12 Bookworm : MariaDB : Galera 集群 : 服务器世界 (server-world.info)
Installing MariaDB Galera Cluster with APT - MariaDB Knowledge Base(不推荐)centos7使用yum安装请参考
MariaDB Galera Cluster部署实战-腾讯云开发者社区-腾讯云 (tencent.com)
官网:使用 YUM 安装 MariaDB Galera 集群
MariaDB安装
安装软件包
DVD镜像里有这三个包,不配置apt source.list的话,装入dvd直接安装也好。我的debian12的带的mariadb版本是 MariaDB 10.11.4
apt install mariadb-server mariadb-client galera-4 -y
实际上apt install mariadb-server -y
会安装好这三个
(可选)修改一些配置:
更改最大连接数
sed -i "/#max_connections/a\max_connections=10000" /etc/mysql/mariadb.conf.d/50-server.cnf
#默认数据目录/var/lib/mysql 修改为/data/mysql_data
#sed -i "s/#datadir=.*/datadir=\/data\/mysql_data/g" /etc/mysql/mariadb.conf.d/50-server.cnf
可以通过配置/usr/lib/systemd/system/mariadb.service来调大打开文件数目
[Service]新添加两行如下参数:
#LimitNOFILE=10000 这个配置文件里默认有 LimitNOFILE=32768
LimitNPROC=10000
启动mariadb
systemctl restart mariadb
初始化配置
运行mysql_secure_installation
进行数据库初始化配置
mysql_secure_installation
Enter current password for root (enter for none): //初次运行时密码为空,直接回车
Switch to unix_socket authentication [Y/n] , so you can safely answer 'n'.//n
Change the root password? [Y/n]//设置数据库的root用户密码 y
New password: //设置root用户的密码
Re-enter new password: //再次输入你设置的密码
Remove anonymous users? [Y/n] //是否删除匿名用户,生产环境建议删除,所以直接回车
Disallow root login remotely? [Y/n] //是否禁止root远程登录,根据自己的需求选择Y/n并回车,建议禁止 直接回车
Remove test database and access to it? [Y/n] //是否删除test数据库,直接回车
Reload privilege tables now? [Y/n] //是否重新加载权限表,直接回车
连接测试
登录
mysql -uroot -p
执行一些查询命令
show databases; /*列出数据库*/
show grants for root@localhost;/*查看用户在localhost登录的权限*/
select user,host,password from mysql.user; /*列出用户以及允许登录的主机*/
\q /*退出*/
galera配置
node1节点:
vi /etc/mysql/mariadb.conf.d/50-server.cnf
#第27行注释掉
#bind-address = 127.0.0.1
#末尾加上
[galera]
wsrep_on=ON
wsrep_provider=/usr/lib/galera/libgalera_smm.so
#集群中所有节点
wsrep_cluster_address="gcomm://192.168.100.111,192.168.100.112,192.168.100.113"
binlog_format=row
default_storage_engine=InnoDB
innodb_autoinc_lock_mode=2
bind-address=0.0.0.0
# 集群名称
wsrep_cluster_name="MariaDB_Cluster"
# 本节点的ip
wsrep_node_address="192.168.100.111"
先停止数据库 systemctl stop mariadb
然后特别的方式启动mariadb来引导新集群第一个节点(此处不一定是node1节点,都可以是第一个节点。如果重启集群时,引导节点也要执行这条命令)
在使用 systemd 的操作系统上,可以通过以下方式引导节点:
galera_new_cluster
无systemd的系统上mariadbd --wsrep-new-cluster 来引导新集群的第一个节点。(连接到现有群集时,请勿使用 --wsrep-new-cluster 选项)
node2/node3节点:
vi /etc/mysql/mariadb.conf.d/50-server.cnf
#第27行注释掉
#bind-address = 127.0.0.1
#末尾加上
[galera]
wsrep_on=ON
wsrep_provider=/usr/lib/galera/libgalera_smm.so
wsrep_cluster_address="gcomm://192.168.100.111,192.168.100.112,192.168.100.113"
binlog_format=row
default_storage_engine=InnoDB
innodb_autoinc_lock_mode=2
bind-address=0.0.0.0
wsrep_cluster_name="MariaDB_Cluster"
# 本节点的ip
wsrep_node_address="192.168.100.112" #node3上wsrep_node_address="192.168.100.113"
正常方式重启mariadb即可:
systemctl restart mariadb
查看集群状态
在任意虚拟机里执行以下命令
mysql -e "show status like 'wsrep_%'" #查看集群全部相关状态参数
mysql -e "show status like 'wsrep_cluster_size'" # 这里应该显示集群里有3个节点
mysql -e "show status like 'wsrep_connected'" # 这里应该显示ON
mysql -e "show status like 'wsrep_incoming_addresses'" # 这里应该显示三个节点的地址
mysql -e "show status like 'wsrep_local_state_comment'" # 这里节点的同步状态
自启动问题
参考https://cloud.tencent.com/developer/article/1120419
以后启动都要先一个节点上galera_new_cluster
另外两个节点systemctl start mariadb
如果
galera_new_cluster
启动报错Jun 25 13:20:35 node1 mariadbd[1388]: 2024-06-25 13:20:35 0 [ERROR] WSREP: It may not be safe to bootstrap the cluster from this node. It was not the last one to leave the cluste> Jun 25 13:20:35 node1 mariadbd[1388]: 2024-06-25 13:20:35 0 [ERROR] WSREP: wsrep::connect(gcomm://192.168.100.111,192.168.100.112,192.168.100.113) failed: 7 Jun 25 13:20:35 node1 mariadbd[1388]: 2024-06-25 13:20:35 0 [ERROR] Aborting Jun 25 13:20:35 node1 systemd[1]: mariadb.service: Main process exited, code=exited, status=1/FAILURE
那就说明此节点不是最后一个离开集群的节点。(
/var/lib/mysql/grastate.dat
中safe_to_bootstrap
为1,说明是最后一个离开集群的节点)尝试从其他节点执行galera_new_cluster
再不行,修改其中一个节点
/var/lib/mysql/grastate.dat
将safe_to_bootstrap
设置为1,来指示某个节点是安全的引导节点 然后在这个节点里galera_new_cluster
再不行,删除/var/lib/mysql/gvwstate.dat
和/var/lib/mysql/gvwstate.dat
(如果存在)
很多服务默认自启的
查看服务默认的启动状态:systemctl list-unit-files | grep nginx
nginx.service enabled enabled
root@galera-nginx1:~# systemctl list-unit-files
UNIT FILE STATE PRESET
。。。查看服务当前的自启设置 service --status-all
root@galera-nginx1:~# service --status-all
[ + ] apparmor
[ + ] chrony
[ - ] console-setup.sh
[ + ] cron
[ + ] dbus
[ - ] hwclock.sh
[ + ] ipvsadm
[ + ] keepalived
[ - ] keyboard-setup.sh
[ + ] kmod
[ + ] networking
[ + ] nginx
[ + ] procps
[ + ] qemu-guest-agent
[ + ] ssh
[ - ] sudo
[ + ] udev
[ + ] zabbix-agent
自制启动脚本
##host_list={galera-node1,galera-node2,galera-node3}
#注意先做好galera-node1到1、2、3的免密登录和/etc/hosts
ssh_port=22
#开始重启任务的时间
echo "task start at: $(date +%Y-%m-%d_%H:%M:%S)"
#本机上用于暂存各个数据库节点的状态的文件
bootstrap_tmpfile=/tmp/.db_stat.$(date +%Y%m%d%H%M)
#如果有未停止的,先全部停止一遍数据库服务
stop_db(){
ssh galera-node3 'systemctl stop mariadb'
sleep 1
ssh galera-node2 'systemctl stop mariadb'
sleep 1
ssh galera-node1 'systemctl stop mariadb'
}
#检查各个数据库节点的状态,保存到bootstrap_tmpfile
check_status(){
for host in {galera-node1,galera-node2,galera-node3};
do
echo ${host} $(ssh ${host} "grep safe_to_bootstrap /var/lib/mysql/grastate.dat"|awk '{print $NF}')
done > ${bootstrap_tmpfile}
}
#检查
start_db(){
#如果存在且仅存在一个 safe_to_bootstrap为1的节点 则引导节点为此节点
if [ $(awk '/ 1/{print $1}' ${bootstrap_tmpfile}|wc -l) = 1 ]; #注意awk关键词里面有个空格
then
host1=$(awk '/ 1/{print $1}' ${bootstrap_tmpfile});
#如果不存在safe_to_bootstrap为1的节点或者存在多个 则需要指定一个节点为引导节点 这里指定的是node1节点
else
host1=galera-node1
ssh $host1 "sed -i '/bootstrap/c\safe_to_bootstrap: 1' /var/lib/mysql/grastate.dat"
#修改其它节点的safe_to_bootstrap为0
for host in $(awk '{print $1}' ${bootstrap_tmpfile}|grep -v ${host1});do
ssh $host "sed -i '/bootstrap/c\safe_to_bootstrap: 0' /var/lib/mysql/grastate.dat"
done
fi
#在引导节点上执行引导集群命令
ssh $host1 'galera_new_cluster'
sleep 3
#在其它节点上正常启动数据库服务
for host in $(awk '{print $1}' ${bootstrap_tmpfile}|grep -v ${host1});do
ssh $host 'systemctl start mariadb'
done
}
main(){
stop_db
check_status
start_db
rm -f ${bootstrap_tmpfile}
echo "task over at: $(date +%Y-%m-%d_%H:%M:%S)"
}
main
各节点上systemctl disable mariadb
在node1节点上 配置开机自启
防火墙配置
MariaDB/MySQL Galera服务器的防火墙规则https://blog.csdn.net/echolivia/article/details/51555051
3306,这个是MariaDB/MySQL的服务端口,这个都不开那就不用跑MariaDB/MySQL服务了。
4567,Galera做数据复制的通讯和数据传输端口,需要在防火墙放开TCP和UDP
4568,Galera做增量数据传输使用的端口(Incremental State Transfer, IST),需要防火墙放开TCP
4444,Galera做快照状态传输使用的端口(State Snapshot Transfer, SST),需要防火墙放开TCP
二、Nginx
Nginx Stream(TCP/UDP)负载均衡 (w3ccoo.com)
两种hash算法的区别
Nginx实现四层/七层负载均衡 - 彼扬 - 博客园 (cnblogs.com)
安装软件包
apt install nginx -y
安装stream支持
apt install libnginx-mod-stream -y
查看是否启用stream模块 ls /etc/nginx/modules-enabled/ 50-mod-stream.conf apt list --installed | grep stream WARNING: apt does not have a stable CLI interface. Use with caution in scripts. libnginx-mod-stream/stable,now 1.22.1-9 amd64 [installed,automatic]
也可以直接apt install nginx-full 会默认安装模块
修改配置文件
cp /etc/nginx/nginx.conf /etc/nginx/nginx.conf.bak
cat /etc/nginx/nginx.conf
user www-data;
worker_processes auto;
pid /run/nginx.pid;
error_log /var/log/nginx/error.log;
include /etc/nginx/modules-enabled/*.conf;
events {
worker_connections 100000;#连接改大
# multi_accept on;
}
#加入stream
stream {
include /etc/nginx/conf.d/stream*.conf;
}
#注释掉或者删除掉http{。。。}
cat > /etc/nginx/conf.d/stream_mysql_cluster.conf <<EOF
upstream mysql_cluster {
#least_conn;#把请求转发给连接数较少的后端
#ip_hash;#通过ip每次把请求尽量发往同一个后端 这个是http里的写法
#参考
#https://blog.csdn.net/kaige8312/article/details/145731636
#https://blog.csdn.net/Tyro_java/article/details/141635599
#https://zhuanlan.zhihu.com/p/565477967
#https://docs.nginx.com/nginx/admin-guide/load-balancer/http-load-balancer/
#https://docs.nginx.com/nginx/admin-guide/load-balancer/tcp-udp-load-balancer/
#hash $remote_addr consistent;
#虚拟节点数设置为32甚至更大,因此即使很少的服务节点也能做到相对均匀的数据分布
hash $remote_addr consistent=1000;
server 192.168.100.111:3306;
server 192.168.100.112:3306;
server 192.168.100.113:3306;
}
server {
listen 3306;
proxy_pass mysql_cluster;
}
EOF
重启Nginx
nginx -t #检查配置
systemctl restart nginx
测试
先在数据库里创建一个可以远程登录的用户testuser1
grant all privileges on *.* to 'testuser1'@'%' identified by '123456' with grant option;
安装客户端通过nginx代理访问数据库
apt install mariadb-client -qy
mysql -uroot -p -h192.168.100.114
三、Keepalived
安装软件包
apt install keepalived
mv /etc/keepalived/keepalived.conf.sample /etc/keepalived/keepalived.conf
如果同一网段中有其他Keepalived要先查看该网络中所存在的vrid:
tcpdump -nn -i any net 192.168.100.0/8
注意: 同一网段中virtual_router_id的值不能重复,否则会出错,相关错误信息如下。
Keepalived_vrrp[27120]: ip address associated with VRID not present in received packet :
one or more VIP associated with VRID mismatch actual MASTER advert
bogus VRRP packet received on eth1 !!!
receive an invalid ip number count associated with VRID!
VRRP_Instance(xxx) ignoring received advertisment…
可以用这条命令来查看该网络中所存在的vrid:
tcpdump -nn -i any net 192.168.100.0/8
原文链接:https://blog.csdn.net/Kim_Weir/article/details/78991854
修改配置文件
查看网卡名ip a
Nginx1(Master节点)
vi /etc/keepalived/keepalived.conf
global_defs {
notification_email {
test@xxx.com
}
notification_email_from info@xxx.com
smtp_server 192.168.x.x
smtp_connect_timeout 30
router_id LVS_DEVEL
vrrp_skip_check_adv_addr
# vrrp_strict
vrrp_garp_interval 0
vrrp_gna_interval 0
}
vrrp_instance VI_1 {
state MASTER
interface enp0s3#网卡改
virtual_router_id 3
priority 100
advert_int 1
authentication {
auth_type PASS
auth_pass dbS3n2 #密码改
}
virtual_ipaddress {
192.168.100.110/24
}
}
Nginx2(Backup节点)
vi /etc/keepalived/keepalived.conf
global_defs {
notification_email {
test@xxx.com
}
notification_email_from info@xxx.com
smtp_server 192.168.x.x
smtp_connect_timeout 30
router_id LVS_DEVEL
vrrp_skip_check_adv_addr
# vrrp_strict
vrrp_garp_interval 0
vrrp_gna_interval 0
}
vrrp_instance VI_1 {
state BACKUP
interface enp0s3#网卡改
virtual_router_id 3 #vrid要相同
priority 99 #优先级小
advert_int 1
authentication {
auth_type PASS
auth_pass dbS3n2 #密码改
}
virtual_ipaddress {
192.168.100.110/24
}
}
重启Keepalived
systemctl start keepalived
防火墙
两台nginx要互通vrrp协议 不然收不到广播。
否则master重启后,当前的master节点Nginx2不会变成backup状态,出现两台都是master、都有vip的情况,即发生了脑裂现象。
VRRP协议在四层,不是tcp,故也没有端口。
笔者的虚拟机中禁用了防火墙,PVE平台上直接配置放通vrrp协议即可。
测试
数据库连接测试:
#通过vip连接数据库
mysql -uroot -p -h192.168.100.110
VIP转移测试:
关闭Nginx1或者关闭Keepalived服务,此时业务不断,可用tcping -t 192.168.100.110 3306
测试
在Nginx2上查看ip,vip浮动到了Nginx2的网卡上,nginx2成为了master
启动nginx1时,会切换回master,nginx2会切换回backup。
systemctl status keepalived -l #查看详细的日志