Rabbitmq+STS+discovery_k8s +localpv部署排坑详解

发布于:2025-08-15 ⋅ 阅读:(14) ⋅ 点赞:(0)

#作者:朱雷

一、部署排坑

1.1. configmap配置文件

编辑cm.yaml 文件

apiVersion: v1
kind: ConfigMap
metadata:
  name: rabbitmq-config
  namespace: rabbitmq-clu-9
data:
  rabbitmq.conf: |
    # 基础配置
    listeners.tcp.default = 5672
    # management.listener.port = 15672
    # management.listener.ssl = false
    disk_free_limit.absolute = 1GB
    cluster_formation.peer_discovery_backend = k8s  #指定集群发现通过k8s插件
    # cluster_formation.peer_discovery_backend = rabbit_peer_discovery_k8s
    cluster_formation.k8s.host = kubernetes.default.svc.cluster.local
cluster_formation.k8s.address_type = hostname
# 后缀中rabbitmq-clu-9 为namespace 名称保持和sts 文件中的namespace一致
    cluster_formation.k8s.hostname_suffix = .rabbitmq-headless.rabbitmq-clu-9.svc.cluster.local
    cluster_formation.discovery_retry_limit = 10
cluster_formation.discovery_retry_interval = 3000
# service_name与headless 中保持一致
    cluster_formation.k8s.service_name = rabbitmq-headless
    cluster_formation.node_cleanup.interval = 30
    cluster_formation.node_cleanup.only_log_warning = false
    cluster_formation.etcd.ssl_options.verify = verify_none
    # 内存配置
    vm_memory_high_watermark.relative = 0.6
    vm_memory_high_watermark_paging_ratio = 0.5
    # 日志配置
    log.console = true
    log.console.level = debug
    log.file = false
    # 临时启用调试日志
    log.connection.level = debug
    log.channel.level = debug
    log.queue.level = debug

1.2. pv文件

apiVersion: v1
kind: PersistentVolume
metadata:
  name: rabbitmq-cluster-pv-0
spec:
  capacity:
    storage: 1Gi
  accessModes:
    - ReadWriteOnce
  storageClassName: hostpath-storage
  hostPath:
    path: /tmp/rabbitmq/0
    type: DirectoryOrCreate
  nodeAffinity:
    required:
      nodeSelectorTerms:
        - matchExpressions:
            - key: kubernetes.io/hostname
              operator: In
              values:
                - 192.168.88.201   #修改为绑定的node的hostname
---
apiVersion: v1
kind: PersistentVolume
metadata:
  name: rabbitmq-cluster-pv-1
spec:
  capacity:
    storage: 1Gi
  accessModes:
    - ReadWriteOnce
  storageClassName: hostpath-storage
  hostPath:
    path: /tmp/rabbitmq/1
    type: DirectoryOrCreate
  nodeAffinity:
    required:
      nodeSelectorTerms:
        - matchExpressions:
            - key: kubernetes.io/hostname
              operator: In
              values:
                - 192.168.88.202    #修改为绑定的node的hostname
---
apiVersion: v1
kind: PersistentVolume
metadata:
  name: rabbitmq-cluster-pv-2
spec:
  capacity:
    storage: 1Gi
  accessModes:
    - ReadWriteOnce
  storageClassName: hostpath-storage
  hostPath:
    path: /tmp/rabbitmq/2
    type: DirectoryOrCreate
  nodeAffinity:
    required:
      nodeSelectorTerms:
        - matchExpressions:
            - key: kubernetes.io/hostname
              operator: In
              values:
                - 192.168.88.203   #修改为绑定的node的hostname

坑1:node选择器绑定时使用的是集群node 的hostname,如node 的IP 和Hostname 不一致,填写IP 会导致pod 为运行pending状态。

1.3. sc文件

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: hostpath-storage
provisioner: kubernetes.io/no-provisioner
volumeBindingMode: WaitForFirstConsumer

1.4. serviceAccount文件

apiVersion: v1
kind: ServiceAccount
metadata:
  name: rabbitmq
  namespace: rabbitmq-clu-9
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: rabbitmq-peer-discovery
rules:
- apiGroups: [""]
  resources: ["nodes", "pods", "endpoints"]
  verbs: ["get", "list", "watch"]
- apiGroups: ["discovery.k8s.io"]
  resources: ["endpointslices"] 
  verbs: ["get", "list"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: rabbitmq-peer-discovery
subjects:
- kind: ServiceAccount
  name: rabbitmq
  namespace: rabbitmq-clu-9
roleRef:
  kind: ClusterRole
  name: rabbitmq-peer-discovery
  apiGroup: rbac.authorization.k8s.io
---
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  name: rabbitmq-configmap
  namespace: rabbitmq-clu-9
rules:
- apiGroups: [""]
  resources: ["configmaps"]
  verbs: ["get", "update"]
  resourceNames: ["rabbitmq-config"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: rabbitmq-configmap
  namespace: rabbitmq-clu-9
subjects:
- kind: ServiceAccount
  name: rabbitmq
roleRef:
  kind: Role
  name: rabbitmq-configmap
  apiGroup: rbac.authorization.k8s.io

坑2:在启动pod 的过程中如果集群角色rabbitmq-peer-discovery未授权nodes资源则pod一直报错启动失败

1.5. headless-service文件

apiVersion: v1
kind: Service
metadata:
  name: rabbitmq-headless
  labels:
    app: rabbitmq
spec:
  clusterIP: None
  ports:
  - name: amqp
    port: 5672
    targetPort: 5672
  - name: management
    port: 15672 
    targetPort: 15672
  - name: epmd
    port: 4369
    targetPort: 4369
  - name: dist
    port: 25672
    targetPort: 25672
  selector:
    app: rabbitmq
  publishNotReadyAddresses: true

坑3:上面这几个端口都需要暴漏出来,否则集群创建失败

1.6. sts文件

apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: rabbitmq
  namespace: rabbitmq-clu-9
  labels:
    app: rabbitmq
spec:
  serviceName: rabbitmq-headless
  replicas: 3
  #podManagementPolicy: "Parallel"
  selector:
    matchLabels:
      app: rabbitmq
  template:
    metadata:
      labels:
        app: rabbitmq
    spec:
      terminationGracePeriodSeconds: 10
      affinity:
        podAntiAffinity:
          preferredDuringSchedulingIgnoredDuringExecution:
          - weight: 100
            podAffinityTerm:
              labelSelector:
                matchExpressions:
                - key: app
                  operator: In
                  values:
                  - rabbitmq
              topologyKey: kubernetes.io/hostname
      serviceAccountName: rabbitmq        
      containers:
      - name: rabbitmq
        image: rabbitmq:3.8.27-management
        imagePullPolicy: IfNotPresent
        ports:
        - containerPort: 5672
          name: amqp
        - containerPort: 15672
          name: http
        env:
        - name: RABBITMQ_USE_LONGNAME
          value: "true"        
        - name: RABBITMQ_ERLANG_COOKIE
          value: "secret-cookie"
        - name: RABBITMQ_DEFAULT_USER
          value: "admin"
        - name: RABBITMQ_DEFAULT_PASS
          value: "admin123"
        volumeMounts:
        - name: config
          mountPath: /etc/rabbitmq/rabbitmq.conf
          subPath: rabbitmq.conf
        - name: data
          mountPath: /var/lib/rabbitmq
          readOnly: false
        #readinessProbe:
        #  exec:
        #    command: ["rabbitmq-diagnostics", "status"]
        #  initialDelaySeconds: 20
        #  periodSeconds: 30
        livenessProbe:
          exec:
            command: ["rabbitmq-diagnostics", "ping"]
          initialDelaySeconds: 60
          periodSeconds: 30
      volumes:
      - name: config
        configMap:
          name: rabbitmq-config
  volumeClaimTemplates:
  - metadata:
      name: data
      namespace: rabbitmq-clu-9
    spec:
      accessModes: [ "ReadWriteOnce" ]
      resources:
        requests:
          storage: 1000M
      storageClassName: hostpath-storage

坑4:RABBITMQ_USE_LONGNAME 环境变量需要指定为true,指定使用FQDN格式避免截全节点通信失败,集群建立失败。

坑5:如果podManagementPolicy 策略不为 “Parallel”, 则readiness 探针需要关闭,避免集群启动失败。

二、RabbitMQ集群部署关键问题总结

以上总结了RabbitMQ集群部署中的五大核心问题,建议在实施前逐项核查配置,可显著提升部署成功率。

  1. 节点选择器绑定:必须使用集群节点的Hostname而非IP,否则Pod会陷入Pending状态。
  2. 角色授权:确保rabbitmq-peer-discovery角色已授权nodes资源,否则Pod启动报错。
  3. 端口暴露:必须开放4369(EPMD)、25672(Erlang分布式通信)等核心端口,否则集群初始化失败。
  4. 长名称配置:环境变量RABBITMQ_USE_LONGNAME需设为true,强制使用FQDN格式避免节点通信截断。
  5. Pod管理策略:若未采用Parallel策略,需关闭readiness探针,防止集群启动阻塞。

网站公告

今日签到

点亮在社区的每一天
去签到