前言
MySQL一主多是从数据库生产级部署方案。第一可以实现数据的多处⾃动备份,加强了数据的安全性。第二实现数据库的拓展,通过多写分离提升数据库的负载性能。web等业务服务器的读操作由多个slave数据库实现,需要修改数据库的写操作在一个master数据库进行。数据文件会根据主从复制自动备份。
下面将通过k8s集群基于NFS,PV/PVC,ConfigMap,StatefulSet部署MySQL一主多从方案。
镜像准备
准备mysql镜像,数据库主镜像,上传到本地仓库:
docker pull mysql:5.7.36
docker tag mysql:5.7.36 easzlab.io.local:5000/myhub/mysql:5.7.36
docker push easzlab.io.local:5000/myhub/mysql:5.7.36
准备xtrabackup镜像,用于主从复制,上传到本地仓库:
docker pull registry.cn-hangzhou.aliyuncs.com/hxpdocker/xtrabackup:1.0
docker tag registry.cn-hangzhou.aliyuncs.com/hxpdocker/xtrabackup:1.0 easzlab.io.local:5000/myhub/xtrabackup:1.0
docker push easzlab.io.local:5000/myhub/xtrabackup:1.0
NFS准备
在NFS服务器192.168.100.155创建目录并共享:
# 创建3个共享目录,属主属组选root。
mkdir -p /data/k8sdata/myserver/mysql-datadir-{1..3}
chown root.root /data/k8sdata/myserver/mysql-datadir-{1..3}
# 提供共享服务,可共享的IP段为物理机IP,rw可读写,no_root_squash指root远程过去还是root,mysql需要root权限。
cat << EOF >> /etc/exports
/data/k8sdata/myserver/mysql-datadir-1 192.168.100.0/24(rw,no_root_squash)
/data/k8sdata/myserver/mysql-datadir-2 192.168.100.0/24(rw,no_root_squash)
/data/k8sdata/myserver/mysql-datadir-3 192.168.100.0/24(rw,no_root_squash)
EOF
# 重新加载,查看挂载情况
exportfs -r
exportfs -v
# nfs涉及端口较多,测试环境可关闭防火墙
systemctl disable --now firewalld
创建PV
一个mysql pod对应一个pv,pod数量不能大于pv总数。
---
apiVersion: v1
kind: PersistentVolume
metadata:
name: mysql-datadir-1
namespace: myserver
spec:
capacity:
storage: 50Gi
accessModes:
- ReadWriteOnce
nfs:
path: /data/k8sdata/myserver/mysql-datadir-1
server: 192.168.100.155
---
apiVersion: v1
kind: PersistentVolume
metadata:
name: mysql-datadir-2
namespace: myserver
spec:
capacity:
storage: 50Gi
accessModes:
- ReadWriteOnce
nfs:
path: /data/k8sdata/myserver/mysql-datadir-2
server: 192.168.100.155
---
apiVersion: v1
kind: PersistentVolume
metadata:
name: mysql-datadir-3
namespace: myserver
spec:
capacity:
storage: 50Gi
accessModes:
- ReadWriteOnce
nfs:
path: /data/k8sdata/myserver/mysql-datadir-3
server: 192.168.100.155
检查pv状态:
sudo kubectl get pv
NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE
mysql-datadir-1 50Gi RWO Retain Available 13s
mysql-datadir-2 50Gi RWO Retain Available 13s
mysql-datadir-3 50Gi RWO Retain Available 13s
MySQL一主多从
配置对象
创建ConfigMap,包括主mysql和从mysql配置信息:
apiVersion: v1
kind: ConfigMap
metadata:
name: mysql
namespace: myserver
labels:
app: mysql
data:
master.cnf: |
# Apply this config only on the master.
[mysqld]
log-bin
log_bin_trust_function_creators=1
lower_case_table_names=1
slave.cnf: |
# Apply this config only on slaves.
[mysqld]
super-read-only
log_bin_trust_function_creators=1
kubectl apply -f mysql-configmap.yaml
无头服务
Headless Service⽆头服务,与service的区别就是它没有Cluster IP,解析它的名称时将返回该Headless Service对应的全部Pod的Endpoint列表。
mysql-read是给slave pod使用的mysql只读服务,以此实现读写分离。
# Headless service for stable DNS entries of StatefulSet members.
apiVersion: v1
kind: Service
metadata:
namespace: myserver
name: mysql
labels:
app: mysql
spec:
ports:
- name: mysql
port: 3306
clusterIP: None
selector:
app: mysql
---
# Client service for connecting to any MySQL instance for reads.
# For writes, you must instead connect to the master: mysql-0.mysql.
apiVersion: v1
kind: Service
metadata:
name: mysql-read
namespace: myserver
labels:
app: mysql
spec:
ports:
- name: mysql
port: 3306
selector:
app: mysql
kubectl apply -f mysql-services.yaml
MySQL主从
创建MySQL一主多从集群,每个pod分别执行4个容器。具体作用如下:
初始化容器1:根据mysql-0数字标记为master,其它为slave,并分发不同配置文件。
初始化容器2:mysql-0不动,mysql-1从mysql-0全量拷贝数据,mysql-2再从mysql-1全量拷贝,以此类推。
主容器mysql:数据库主程序,都有读写功能。读写分离依靠mysql和mysql-read服务实现。
主容器xtrabackup:实现主从复制自动备份,除刚创建外都从mysql-0拷贝。开放3307端口供后一位pod全量复制。
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: mysql
namespace: myserver
spec:
selector:
matchLabels:
app: mysql
serviceName: mysql
replicas: 3
template:
metadata:
labels:
app: mysql
spec:
initContainers:
- name: init-mysql #初始化容器1、基于当前pod name匹配角色是master还是slave,并动态生成相对应的配置文件
image: easzlab.io.local:5000/myhub/mysql:5.7.36
command:
- bash
- "-c"
- |
set -ex
# Generate mysql server-id from pod ordinal index.
[[ `hostname` =~ -([0-9]+)$ ]] || exit 1 #匹配hostname的最后一位、最后是一个顺序叠加的整数
ordinal=${BASH_REMATCH[1]}
echo [mysqld] > /mnt/conf.d/server-id.cnf
# Add an offset to avoid reserved server-id=0 value.
echo server-id=$((100 + $ordinal)) >> /mnt/conf.d/server-id.cnf
# Copy appropriate conf.d files from config-map to emptyDir.
if [[ $ordinal -eq 0 ]]; then #如果是master、则cpmaster配置文件
cp /mnt/config-map/master.cnf /mnt/conf.d/
else #否则cp slave配置文件
cp /mnt/config-map/slave.cnf /mnt/conf.d/
fi
volumeMounts:
- name: conf #临时卷、emptyDir
mountPath: /mnt/conf.d
- name: config-map
mountPath: /mnt/config-map
- name: clone-mysql #初始化容器2、用于生成mysql配置文件、并从上一个pod完成首次的全量数据clone(slave 3从slave2 clone,而不是每个slave都从master clone实现首次全量同步,但是后期都是与master实现增量同步)
image: easzlab.io.local:5000/myhub/xtrabackup:1.0
command:
- bash
- "-c"
- |
set -ex
# Skip the clone if data already exists.
[[ -d /var/lib/mysql/mysql ]] && exit 0
# Skip the clone on master (ordinal index 0).
[[ `hostname` =~ -([0-9]+)$ ]] || exit 1
ordinal=${BASH_REMATCH[1]}
[[ $ordinal -eq 0 ]] && exit 0 #如果最后一位是0(master)则退出clone过程
# Clone data from previous peer.
ncat --recv-only mysql-$(($ordinal-1)).mysql 3307 | xbstream -x -C /var/lib/mysql #从上一个pod执行clone(binlog),xbstream为解压缩命令
# Prepare the backup.xue
xtrabackup --prepare --target-dir=/var/lib/mysql #通过xtrabackup恢复binlog
volumeMounts:
- name: data
mountPath: /var/lib/mysql
subPath: mysql
- name: conf
mountPath: /etc/mysql/conf.d
containers:
- name: mysql #业务容器1(mysql主容器)
image: easzlab.io.local:5000/myhub/mysql:5.7.36
env:
- name: MYSQL_ALLOW_EMPTY_PASSWORD
value: "1"
ports:
- name: mysql
containerPort: 3306
volumeMounts:
- name: data #挂载数据目录至/var/lib/mysql
mountPath: /var/lib/mysql
subPath: mysql
- name: conf #配置文件/etc/mysql/conf.d
mountPath: /etc/mysql/conf.d
resources: #资源限制
requests:
cpu: 500m
memory: 1Gi
livenessProbe: #存活探针
exec:
command: ["mysqladmin", "ping"]
initialDelaySeconds: 30
periodSeconds: 10
timeoutSeconds: 5
readinessProbe: #就绪探针
exec:
# Check we can execute queries over TCP (skip-networking is off).
command: ["mysql", "-h", "127.0.0.1", "-e", "SELECT 1"]
initialDelaySeconds: 5
periodSeconds: 2
timeoutSeconds: 1
- name: xtrabackup #业务容器2(xtrabackup),用于后期同步master 的binglog并恢复数据
image: easzlab.io.local:5000/myhub/xtrabackup:1.0
ports:
- name: xtrabackup
containerPort: 3307
command:
- bash
- "-c"
- |
set -ex
cd /var/lib/mysql
# Determine binlog position of cloned data, if any.
if [[ -f xtrabackup_slave_info ]]; then
# XtraBackup already generated a partial "CHANGE MASTER TO" query
# because we're cloning from an existing slave.
mv xtrabackup_slave_info change_master_to.sql.in
# Ignore xtrabackup_binlog_info in this case (it's useless).
rm -f xtrabackup_binlog_info
elif [[ -f xtrabackup_binlog_info ]]; then
# We're cloning directly from master. Parse binlog position.
[[ `cat xtrabackup_binlog_info` =~ ^(.*?)[[:space:]]+(.*?)$ ]] || exit 1
rm xtrabackup_binlog_info
echo "CHANGE MASTER TO MASTER_LOG_FILE='${BASH_REMATCH[1]}',\
MASTER_LOG_POS=${BASH_REMATCH[2]}" > change_master_to.sql.in #生成CHANGE MASTER命令
fi
# Check if we need to complete a clone by starting replication.
if [[ -f change_master_to.sql.in ]]; then
echo "Waiting for mysqld to be ready (accepting connections)"
until mysql -h 127.0.0.1 -e "SELECT 1"; do sleep 1; done
echo "Initializing replication from clone position"
# In case of container restart, attempt this at-most-once.
mv change_master_to.sql.in change_master_to.sql.orig
#执行CHANGE MASTER操作并启动SLAVE
mysql -h 127.0.0.1 <<EOF
$(<change_master_to.sql.orig),
MASTER_HOST='mysql-0.mysql',
MASTER_USER='root',
MASTER_PASSWORD='',
MASTER_CONNECT_RETRY=10;
START SLAVE;
EOF
fi
# Start a server to send backups when requested by peers. #监听在3307端口,用于为下一个pod同步全量数据
exec ncat --listen --keep-open --send-only --max-conns=1 3307 -c \
"xtrabackup --backup --slave-info --stream=xbstream --host=127.0.0.1 --user=root"
volumeMounts:
- name: data
mountPath: /var/lib/mysql
subPath: mysql
- name: conf
mountPath: /etc/mysql/conf.d
resources:
requests:
cpu: 100m
memory: 100Mi
volumes:
- name: conf
emptyDir: {}
- name: config-map
configMap:
name: mysql
volumeClaimTemplates:
- metadata:
name: data
spec:
accessModes: ["ReadWriteOnce"]
resources:
requests:
storage: 10Gi
查看一主多从pod状态:
sudo kubectl get pod -n myserver
NAME READY STATUS RESTARTS AGE
mysql-0 2/2 Running 0 21m
mysql-1 2/2 Running 1 (20m ago) 21m
mysql-2 2/2 Running 0 18m
进入slave pod显示slave状态:
kubectl exec -it mysql-1 -n myserver -- mysql
mysql> show slave status\G;
*************************** 1. row ***************************
Slave_IO_State: Waiting for master to send event
Master_Host: mysql-0.mysql
Master_User: root
Master_Port: 3306
Connect_Retry: 10
Master_Log_File: mysql-0-bin.000003
Read_Master_Log_Pos: 316
Relay_Log_File: mysql-1-relay-bin.000002
Relay_Log_Pos: 484
Relay_Master_Log_File: mysql-0-bin.000003
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
排错:NFS目录权限
在创建一主多从pod,可能会CrashLoopBackOff状态,通过logs发现Operation not permitted:
sudo kubectl logs mysql-0 -n myserver
# 一个pod有4个容器,默认进入mysql
Defaulted container "mysql" out of: mysql, xtrabackup, init-mysql (init), clone-mysql (init)
# 对/var/lib/mysql/操作无权限
chown: changing ownership of '/var/lib/mysql/': Operation not permitted
错误原因:共享NFS目录时使用了错误的权限。
正确权限:192.168.100.0/24(rw, no_root_squash)
squash原意压榨,即远程用户进入NFS服务器,角色会降低权限。
no_squash保留身份,root还是root,tom还是tom,但本质是保留uid+gid。
在NFS配置中squash几种组合:
- (root_squash, no_all_squash):默认配置,root用户压榨成nobody,普通用户不改变身份。
- (no_root_squash, no_all_squash):root进入NFS后还是root,uid=gid=0,普通用户也没变。
- (root_squash, all_squash):root用户和普通用户都被压榨成nobody。权限降低相对安全。
- (no_root_squash, all_squash):出现all_squash,no_root_squash失效,效果等同于3。