Redis Sentinel：高可用架构的守护者-EW帮帮网

🛡️ Redis Sentinel：高可用架构的守护者

🧠 一、Sentinel 架构概览

💡 为什么需要 Sentinel？
Redis Sentinel 是 Redis 的高可用性解决方案，解决了主从复制架构中的关键问题：

自动故障检测：无需人工干预的主节点监控
自动故障转移：主节点故障时自动切换到从节点
配置管理：自动更新客户端的主节点配置
通知系统：实时发送系统状态告警

📊 Sentinel 架构模型

Sentinel 核心功能：

👁️ 监控：持续检查主从节点健康状态
🔔 通知：通过API或脚本发送系统事件
🔄 自动故障转移：主节点故障时自动提升从节点
📋 配置提供：向客户端提供当前主节点信息

📋 Sentinel vs 手动主从切换

特性	Sentinel 自动切换	手动切换
响应速度	秒级自动检测	分钟级人工响应
可用性	99.99%+	依赖运维人员
一致性	自动配置同步	容易配置错误
运维成本	低（一次性配置）	高（持续人工干预）
故障处理	全自动流程	人工操作风险高

⚡ 二、故障转移深度解析

💡 故障检测机制

Sentinel 使用心跳检测和共识算法来判定节点状态：

状态判定流程：

主观下线（SDOWN）：单个 Sentinel 认为节点不可用
客观下线（ODOWN）：多数 Sentinel 达成共识认为节点不可用

🚨 故障转移流程

Leader 选举机制：

采用 Raft 算法变种
需要多数 Sentinel 节点同意
最先检测到故障的 Sentinel 优先成为 Leader

⚙️ 故障转移详细步骤

1. 故障检测阶段：

# Sentinel 发送PING命令
PING master-name

# 判定超时（默认30秒）
sentinel down-after-milliseconds mymaster 30000

2. 选举阶段：

# Sentinel 请求投票
SENTINEL is-master-down-by-addr <ip> <port> <epoch> <runid>

# 投票响应
<vote_epoch> <leader_runid> <leader_epoch>

3. 故障转移阶段：

# 提升从节点为主节点
SLAVEOF NO ONE

# 重新配置其他从节点
SLAVEOF <new-master-ip> <new-master-port>

# 更新配置纪元
CONFIG REWRITE

4. 恢复阶段：

# 旧主节点恢复后变为从节点
SLAVEOF <new-master-ip> <new-master-port>

🔧 三、实战配置指南

⚙️ Sentinel 基础配置

sentinel.conf 配置文件：

# 监控主节点（mymaster为自定义名称）
sentinel monitor mymaster 127.0.0.1 6379 2

# 主观下线时间（毫秒）
sentinel down-after-milliseconds mymaster 30000

# 故障转移超时时间
sentinel failover-timeout mymaster 180000

# 并行同步数
sentinel parallel-syncs mymaster 1

# 密码认证（如主节点有密码）
sentinel auth-pass mymaster your_strong_password

# 日志文件
logfile "/var/log/redis/sentinel.log"

# 守护进程模式
daemonize yes

# 保护模式
protected-mode no

🚀 部署实战步骤

1. 环境准备（3台服务器）：

# 服务器规划：
# 节点1: Redis Master + Sentinel
# 节点2: Redis Slave + Sentinel  
# 节点3: Redis Slave + Sentinel

2. 配置 Redis 主从：

# 主节点redis.conf
requirepass master_password
masterauth master_password

# 从节点redis.conf
slaveof 192.168.1.100 6379
masterauth master_password
requirepass slave_password

3. 启动 Sentinel：

# 启动Sentinel进程
redis-sentinel /path/to/sentinel.conf

# 或者使用Redis服务器模式
redis-server /path/to/sentinel.conf --sentinel

4. 验证部署：

# 查看Sentinel信息
redis-cli -p 26379 info sentinel

# 输出示例：
# sentinel_masters:1
# sentinel_tilt:0
# sentinel_running_scripts:0
# sentinel_scripts_queue_length:0
# sentinel_simulate_failure_flags:0
# master0:name=mymaster,status=ok,address=127.0.0.1:6379,slaves=2,sentinels=3

📊 客户端集成

Java 客户端示例：

public class SentinelAwareClient {
    private JedisSentinelPool sentinelPool;
    
    public void init() {
        Set<String> sentinels = new HashSet<>();
        sentinels.add("192.168.1.100:26379");
        sentinels.add("192.168.1.101:26379");
        sentinels.add("192.168.1.102:26379");
        
        JedisPoolConfig poolConfig = new JedisPoolConfig();
        poolConfig.setMaxTotal(100);
        
        sentinelPool = new JedisSentinelPool("mymaster", sentinels, poolConfig);
    }
    
    public String get(String key) {
        try (Jedis jedis = sentinelPool.getResource()) {
            return jedis.get(key);
        }
    }
}

Spring Boot 配置：

# application.yml
spring:
  redis:
    sentinel:
      master: mymaster
      nodes:
        - 192.168.1.100:26379
        - 192.168.1.101:26379
        - 192.168.1.102:26379
    password: your_redis_password

🚨 四、常见问题与优化

⚠️ 脑裂问题与解决方案

脑裂（Split-Brain）场景：

防脑裂配置：

# 最小从节点数
min-slaves-to-write 1

# 最大延迟时间
min-slaves-max-lag 10

# 旧主节点恢复后变为从节点
sentinel auth-pass mymaster your_password

⏱️ 故障转移耗时优化

转移阶段耗时分析：

阶段	默认耗时	优化措施	目标耗时
主观下线	30秒	down-after-milliseconds	10-15秒
客观下线	30-60秒	增加Sentinel节点	10-20秒
Leader选举	10-30秒	优化网络	5-10秒
从节点晋升	1-5秒	预配置	<1秒
配置传播	1-3秒	客户端缓存	<1秒
总耗时	60-120秒	综合优化	20-40秒

优化配置示例：

# 网络优化
sentinel down-after-milliseconds mymaster 10000
sentinel parallel-syncs mymaster 2

# 超时优化
sentinel failover-timeout mymaster 60000

# 选举优化
sentinel election-timeout mymaster 10000

🔍 监控与告警配置

关键监控指标：

# 监控Sentinel状态
redis-cli -p 26379 info sentinel

# 监控主节点切换
redis-cli -p 26379 sentinel master mymaster

# 监控从节点状态
redis-cli -p 26379 sentinel slaves mymaster

告警脚本配置：

# sentinel.conf 告警配置
sentinel notification-script mymaster /path/to/notification.sh
sentinel client-reconfig-script mymaster /path/to/reconfig.sh

告警脚本示例：

#!/bin/bash
# notification.sh
EVENT_TYPE=$1
EVENT_DESCRIPTION=$2

case $EVENT_TYPE in
    +sdown)
        echo "主观下线: $EVENT_DESCRIPTION" | mail -s "Redis警报" admin@example.com
        ;;
    +odown)
        echo "客观下线: $EVENT_DESCRIPTION" | mail -s "Redis紧急警报" admin@example.com
        ;;
    +switch-master)
        echo "主节点切换: $EVENT_DESCRIPTION" | mail -s "Redis主节点切换" admin@example.com
        ;;
esac

💡 五、总结与最佳实践

📊 Sentinel vs Cluster 对比

特性	Redis Sentinel	Redis Cluster	适用场景
数据分片	❌ 不支持	✅ 支持	大数据量
自动故障转移	✅ 支持	✅ 支持	高可用
水平扩展	❌ 不支持	✅ 支持	高性能
客户端支持	广泛支持	需要Cluster支持	兼容性
部署复杂度	简单	复杂	运维成本
数据一致性	强一致性	最终一致性	业务需求
推荐规模	中小集群（<100GB）	大规模集群（>100GB）	数据量

🎯 架构选型指南

🔧 生产环境最佳实践

1. 部署规范：

✅ Sentinel 节点数：至少3个且为奇数
✅ 节点分布：跨机架/可用区部署
✅ 网络配置：低延迟内部网络
✅ 监控告警：全覆盖监控体系

2. 配置优化：

# 生产环境推荐配置
sentinel monitor mymaster 192.168.1.100 6379 2
sentinel down-after-milliseconds mymaster 10000
sentinel failover-timeout mymaster 60000
sentinel parallel-syncs mymaster 2
sentinel auth-pass mymaster your_strong_password
protected-mode yes
daemonize yes
logfile "/var/log/redis/sentinel.log"

3. 客户端实践：

// 客户端重试策略
public class RedisClient {
    private JedisSentinelPool pool;
    
    public String getWithRetry(String key, int maxRetries) {
        for (int i = 0; i < maxRetries; i++) {
            try {
                return pool.getResource().get(key);
            } catch (Exception e) {
                if (i == maxRetries - 1) throw e;
                waitForRetry(i);
            }
        }
        return null;
    }
    
    private void waitForRetry(int attempt) {
        try {
            Thread.sleep(Math.min(1000 * (attempt + 1), 5000));
        } catch (InterruptedException e) {
            Thread.currentThread().interrupt();
        }
    }
}

📈 容量规划建议

集群规模规划：

数据规模	推荐架构	节点配置	预估可用性
< 16GB	1主2从+3Sentinel	4核8GB	99.9%
16-64GB	1主2从+3Sentinel	8核16GB	99.95%
64-256GB	1主3从+5Sentinel	16核32GB	99.99%
>256GB	Redis Cluster	多分片多副本	99.999%

🚀 故障演练方案

定期演练项目：

主节点重启：验证自动故障转移
网络分区：模拟脑裂场景处理
Sentinel故障：测试共识机制
从节点晋升：验证数据一致性
客户端重连：测试客户端恢复能力

演练脚本示例：

#!/bin/bash
# 模拟主节点故障
echo "模拟主节点故障..."
redis-cli -h redis-master debug segfault

echo "等待故障转移..."
sleep 30

# 验证新主节点
NEW_MASTER=$(redis-cli -p 26379 sentinel get-master-addr-by-name mymaster)
echo "新主节点: $NEW_MASTER"

# 验证数据一致性
echo "数据一致性检查..."
redis-cli -h $NEW_MASTER info replication

Redis Sentinel：高可用架构的守护者