智慧园区机器人:服务内限流与熔断降级的实战(Sentinel + Redis 兜底)

发布于:2025-08-17 ⋅ 阅读:(19) ⋅ 点赞:(0)

目标:在不改变前端接口的前提下,为高仙 OpenAPI 代理链路加上细粒度限流读接口的降级缓存兜底,并让写接口在异常时明确失败且不产生副作用。

一、改造思路总览

  • 注解下沉:把 @SentinelResource 从 Controller 统一下沉到 Service(GsOpenApiServiceImpl),一个业务方法对应一个 Sentinel 资源名,集中治理更清晰。

  • 资源命名规范(用于规则绑定)

    • gs.listRobots

    • gs.getRobotStatus

    • gs.postListRobotMap

    • gs.listSubareas

    • gs.listRobotCommands

    • gs.sendTempTask

  • 读写分治

    • 读(查询类)接口:命中限流/熔断 → 读缓存兜底(短 TTL)。

    • 写(下发任务)接口:命中限流/熔断 → 直接返回失败语义(不做缓存,防止错发/重复发)。

二、Service 改造(核心代码骨架)

1)注入 Redis & 统一 Key/JSON 工具

@Slf4j
@Service
public class GsOpenApiServiceImpl implements GsOpenApiService {

    private final RestTemplate restTemplate;
    private final GsOpenApiProperties props;
    private final StringRedisTemplate redis;

    private static final ObjectMapper OM = new ObjectMapper();

    // 缓存 Key 统一管理
    private static String kRobotList() { return "robot:list"; }
    private static String kRobotStatus(String sn){ return "robot:xxxx:" + sn; }
    private static String kRobotMapList(String sn){ return "robot:map:xxxx:" + sn; }
    private static String kSubareas(String mapId, String sn){ return "robot:map:xxxx:" + id + ":" + sn; }
    private static String kRobotCmds(String sn, int p, int s){ return "robot:cmds:" + sn + ":" + p + ":" + s; }

    // JSON 工具
    private String toJson(Object o){ try { return OM.writeValueAsString(o); } catch (Exception e){ return "{}"; } }
    private <T> T fromJson(String s, TypeReference<T> t){ try { return OM.readValue(s, t); } catch (Exception e){ return null; } }

    // 省略构造器与 token 获取...
}

读接口我们会在“正常返回”时写缓存;命中限流/熔断或出现异常时,读取缓存兜底。

2)读接口(示例:状态查询)—有缓存兜底

@SentinelResource(
    value = "gs.getRobotStatus",
    blockHandler = "getRobotStatusBlockHandler",
    fallback = "getRobotStatusFallback"
)
public String getRobotStatus(String robotSn) {
    String api = props.getBaseUrl() + "/openapi/xxxxx/xx/xxxxx/" + xxxx + "/status";
    HttpHeaders headers = new HttpHeaders();
    headers.setBearerAuth(getToken());
    ResponseEntity<String> resp = restTemplate.exchange(api, HttpMethod.GET, new HttpEntity<>(headers), String.class);

    String body = resp.getBody();
    if (body != null) {
        // 写缓存:短 TTL(示例 3 分钟)
        try { redis.opsForValue().set(kRobotStatus(robotSn), body, 3, TimeUnit.MINUTES); } catch (Exception ignore) {}
    }
    return body;
}

// 被限流/熔断 → 读缓存;无缓存则返回 429 语义 JSON
public String getRobotStatusBlockHandler(String robotSn, BlockException ex) {
    log.warn("[getRobotStatus] blocked: {}, sn={}", ex.getClass().getSimpleName(), robotSn);
    String cached = redis.opsForValue().get(kRobotStatus(robotSn));
    return (cached != null) ? cached : "{\"code\":429,\"msg\":\"限流/熔断,且无缓存\"}";
}

// 方法异常 → 读缓存;无缓存则返回 503 语义 JSON
public String getRobotStatusFallback(String robotSn, Throwable ex) {
    log.warn("[getRobotStatus] fallback: {}, sn={}", ex.toString(), robotSn);
    String cached = redis.opsForValue().get(kRobotStatus(robotSn));
    return (cached != null) ? cached : "{\"code\":503,\"msg\":\"服务异常,且无缓存\"}";
}

其它读接口(机器人列表 / 地图列表 / 分区 / 指令列表)同理:正常→写缓存异常→读缓存,TTL 适度不同:

  • listRobots:5min

  • postListRobotMap:3min

  • listSubareas:10min

  • listRobotCommands:2min

3)写接口(示例:无站点临时任务)—不缓存,明确失败

@SentinelResource(
    value = "gs.sendTempTask",
    blockHandler = "sendTempTaskBlock",
    fallback = "sendTempTaskFallback"
)
public String sendTempTask(GsTempTaskDto dto) {
    String api = props.getBaseUrl() + "/openapi/xxxxx/xxxxxx/xxxxxxx";
    HttpHeaders headers = new HttpHeaders();
    headers.setContentType(MediaType.APPLICATION_JSON);
    headers.setBearerAuth(getToken());
    return restTemplate.postForEntity(api, new HttpEntity<>(dto, headers), String.class).getBody();
}

// 被限流/熔断:直接“明确失败”,不做任何缓存/副作用
public String sendTempTaskBlock(GsTempTaskDto dto, BlockException ex) {
    log.warn("[sendTempTask] blocked: {}", ex.getClass().getSimpleName());
    return "{\"code\":429,\"msg\":\"限流/熔断,任务未下发\"}";
}

// 方法异常:同上“明确失败”
public String sendTempTaskFallback(GsTempTaskDto dto, Throwable ex) {
    log.warn("[sendTempTask] fallback: {}", ex.toString());
    return "{\"code\":503,\"msg\":\"服务异常,任务未下发\"}";
}

写接口的“明确失败”让前端可以清晰提示并二次重试;避免“误以为成功”或“重试导致重复下发”。

三、RestTemplate 与统一异常处理

1)超时/连接池(让“慢调用/异常”可被准确感知)

@Bean
public RestTemplate restTemplate() {
    var http = HttpClients.custom()
        .disableAutomaticRetries()
        .setMaxConnTotal(200)
        .setMaxConnPerRoute(50)
        .build();
    var f = new HttpComponentsClientHttpRequestFactory(http);
    f.setConnectTimeout(2000);
    f.setReadTimeout(5000);
    f.setConnectionRequestTimeout(2000);
    return new RestTemplate(f);
}

注:如果你希望 4xx/5xx 直接抛异常,别自定义 ErrorHandler;如果你要“200+错误体”风格,就保留一个吞错的 ErrorHandler 并在业务里解析。本文使用“抛异常→fallback 更清晰”。

2)全局异常(可选,但推荐)

@RestControllerAdvice
public class GlobalExceptionHandler {
  @ExceptionHandler(BlockException.class)
  public AjaxResult onBlock(BlockException ex){
    return AjaxResult.error(429, "流控/降级触发:" + ex.getClass().getSimpleName());
  }
  @ExceptionHandler(Throwable.class)
  public AjaxResult onAny(Throwable ex, HttpServletRequest req){
    log.error("Unhandled ex, uri={}", req.getRequestURI(), ex);
    return AjaxResult.error(500, "系统繁忙,请稍后再试");
  }
}

如果你还有URL 级限流(不走注解),可以自定义 UrlBlockHandler 统一返回 HTTP 429。

四、Nacos 规则持久化(Flow/Degrade)

1)应用配置(ruoyi-robot-dev.yml 片段,已脱敏)

spring:
  cloud:
    sentinel:
      eager: true
      transport:
        dashboard: <SENTINEL_DASHBOARD_HOST:PORT>   # 例:127.0.0.1:8718
      datasource:
        flow:
          nacos:
            serverAddr: <NACOS_ADDR>               # 例:127.0.0.1:8848
            groupId: DEFAULT_GROUP
            username: <NACOS_USER>
            password: <NACOS_PASS>
            dataId: ruoyi-robot-flow-rules
            dataType: json
            ruleType: flow
        degrade:
          nacos:
            serverAddr: <NACOS_ADDR>
            groupId: DEFAULT_GROUP
            username: <NACOS_USER>
            password: <NACOS_PASS>
            dataId: ruoyi-robot-degrade-rules
            dataType: json
            ruleType: degrade

2)日常口径(Flow 限流)

json

[
  {"resource":"gs.listRobots","grade":1,"count":5,"intervalSec":1},
  {"resource":"gs.getRobotStatus","grade":1,"count":10,"intervalSec":1},
  {"resource":"gs.postListRobotMap","grade":1,"count":6,"intervalSec":1},
  {"resource":"gs.listSubareas","grade":1,"count":6,"intervalSec":1},
  {"resource":"gs.listRobotCommands","grade":1,"count":8,"intervalSec":1},
  {"resource":"gs.sendTempTask","grade":1,"count":2,"intervalSec":1}
]

grade=1 表示 QPS 阈值。压测要看熔断时,请先调大这些阈值,避免总是被限流先命中。

3)日常口径(Degrade 熔断)

读接口采用“慢调用比例”,写接口采用“异常比例”(阈值仅示例,实际按你的接口时延/稳定性调优):

json

[
  {"resource":"gs.getRobotStatus","grade":0,"count":800,"slowRatioThreshold":0.5,"minRequestAmount":20,"statIntervalMs":10000,"timeWindow":10},
  {"resource":"gs.listRobots","grade":0,"count":1200,"slowRatioThreshold":0.5,"minRequestAmount":20,"statIntervalMs":10000,"timeWindow":10},
  {"resource":"gs.postListRobotMap","grade":0,"count":1200,"slowRatioThreshold":0.5,"minRequestAmount":20,"statIntervalMs":10000,"timeWindow":10},
  {"resource":"gs.listSubareas","grade":0,"count":1200,"slowRatioThreshold":0.5,"minRequestAmount":20,"statIntervalMs":10000,"timeWindow":10},
  {"resource":"gs.listRobotCommands","grade":0,"count":1200,"slowRatioThreshold":0.5,"minRequestAmount":20,"statIntervalMs":10000,"timeWindow":10},
  {"resource":"gs.sendTempTask","grade":1,"count":0.2,"minRequestAmount":10,"statIntervalMs":10000,"timeWindow":5}
]

注意:Nacos 配置里是 纯 JSON,不要写 // 注释,否则加载失败(踩过一次坑)。

五、联调与自测要点(带占位符)

  • 通过网关访问(不要直连服务):

    • 机器人列表:GET <GW>/external/gs/xxxxx

    • 机器人状态:GET <GW>/external/gs/xxxxx

    • 地图列表(表单):POST <GW>/external/gs/map/xxxxxxContent-Type: application/x-www-form-urlencoded,Body:robotSn=<ROBOT_SN>

    • 分区查询(JSON):POST <GW>/external/gs/map/xxxxx,Body:{"mapId":"<MAP_ID>","robotSn":"<ROBOT_SN>"}

    • 下发无站点任务(JSON):POST <GW>/external/gs/xxxxx,Body:{"robotSn":"<ROBOT_SN>","taskName":"<TASK>","mapId":"<MAP_ID>","subareaId":"<SUB_ID>","loop":false,"times":1}

  • 并发/压测时如果你希望验证“熔断”而不是“限流”,就:

    1. 调大 Flow(QPS)阈值;

    2. 用极小的 Degrade RT 阈值或在方法内临时 sleep(...)

    3. 观察日志出现 DegradeException(熔断)而非 FlowException(限流)。

  • 日志期待:

    • 命中限流:FlowException

    • 命中熔断:DegradeException

    • 读接口命中时,能看到从缓存兜底返回;写接口命中时,明确失败 JSON


六、常见坑与修复记录

  • Nacos JSON 不能有注释,否则规则加载失败。

  • POST 表单 vs JSONrobotMap 必须走 application/x-www-form-urlencoded;分区查询是 JSON。

  • @SentinelResource handlers 签名必须与原方法一致,且在最后加上 BlockException/Throwable

  • 资源名对齐dataId 下发的规则 resource 必须等于 @SentinelResource.value

  • 网关与服务内同时开启:谁先达到阈值谁生效。要测服务内熔断,就临时放开网关限流或把服务内限流阈值设置更敏感的口径。


七、定义完成(DoD)

  • 压测阶段能在日志中稳定看到 FlowExceptionDegradeException

  • 读接口在限流/熔断时走缓存兜底;无缓存时返回清晰 429/503 JSON;

  • 写接口在限流/熔断/异常时统一返回失败 JSON,不产生副作用;

  • Nacos 改规则可热生效,测试后恢复到“日常口径”。


附:脱敏占位符一览

  • <GW>:你的网关地址(例:http://localhost:8080

  • <NACOS_ADDR>:Nacos 地址(例:127.0.0.1:8848

  • <SENTINEL_DASHBOARD_HOST:PORT>:Sentinel 控制台(例:127.0.0.1:8718

  • <NACOS_USER> / <NACOS_PASS>:Nacos 鉴权

  • <ROBOT_SN>:机器人序列号(例:GS***-****

  • <MAP_ID> / <SUB_ID>:地图/分区 ID

  • <TASK>:任务名


网站公告

今日签到

点亮在社区的每一天
去签到