当前位置:   article > 正文

实战:微服务之Spring Cloud 负载均衡组件loadbalance和ribbon的超时与重试机制

实战:微服务之Spring Cloud 负载均衡组件loadbalance和ribbon的超时与重试机制

一、概叙

1.1 实现目标

服务A调用服务B1和B2(B1和B2提供同种服务),当服务B1/B2在停止和重新发布阶段,或B1/B2有一个服务故障时,

  • 需保证服务A正常调用B服务,达到无感知发布的效果(服务B高可用)
  • 需保证服务A的请求负载均衡,避免某个B服务节点压力过大(服务B负载均衡)
  • 主要是验证服务调用超时和重试机制

说明:有用nacos服务注册发现组件。

1.2 环境

  1. <maven.compiler.source>1.8</maven.compiler.source>
  2. <maven.compiler.target>1.8</maven.compiler.target>
  3. <spring.boot.version>2.2.2.RELEASE</spring.boot.version>
  4. <spring.cloud.version>Hoxton.SR1</spring.cloud.version>
  5. <spring.alibaba.version>2.1.0.RELEASE</spring.alibaba.version>

服务消费端:已经排除了ribbon,用的是官方推荐的loadbalancer

二、服务调用超时和重试案例

2.1 服务提供者:provider-user

详细nacos上的服务信息

备注:provider-user启动两个服务;provider-user--3015和provider-user--4015

服务端代码

2.2 服务消费者:provider-order

retry接口用的是默认配置:PoolingHttpClientConnectionManager

retry2接口用的是自定义配置:RestTemplate

配置

2.3 负载均衡测试

启动一个消费者服务provider-order--3017;

多次请求provider-order--3017的retry和retry2,通过日志可以确认默认使用了轮询的负载均衡策略来调用provider-user--3015和provider-user--4015

2.4 高可用测试

停止其中一个provider-user-4015服务实例,确认轮询到已停止的服务时,可以成功地在未停止的服务上自动重试请求。

2.5 ribbon.restclient.enabled

1.不设置ribbon.restclient.enabled=true时

provider-order--3017:/retry 接口 直接超时报错,并未进行重试

  1. /** todo 5秒即超时报错,公用的PoolingHttpClientConnectionManager
  2. * 2024-08-05 20:40:53.150[] order [http-nio-0.0.0.0-3017-exec-3] DEBUG o.a.h.impl.conn.PoolingHttpClientConnectionManager-349- Connection released: [id: 0][route: {}->http://192.168.1.4:3015][total kept alive: 0; route allocated: 0 of 50; total allocated: 0 of 200]
  3. * 2024-08-05 20:40:53.155[] order [http-nio-0.0.0.0-3017-exec-3] DEBUG c.n.loadbalancer.reactive.LoadBalancerCommand-314- Got error java.net.SocketTimeoutException: Read timed out when executed on server 192.168.1.4:3015
  4. */

provider-order--3017:/retry2 接口   7秒也不报错,且未进行重试。

  1. @GetMapping("/retry2") // todo retry2 7秒也不报错 ,单独配置的RestTemplate; 2024-08-05 20:43:57.405[] order [http-nio-0.0.0.0-3017-exec-4] DEBUG org.springframework.web.client.RestTemplate-147- HTTP GET http://provider-user/user/api/v1/retry?name=String
  2. *2024-08-05 20:29:41.267[] user [http-nio-0.0.0.0-3015-exec-9] INFO c.z.s.cloud.user.controller.RestfulApiController-65- sleep= 7s
  3. * 2024-08-05 20:29:51.358[] user [http-nio-0.0.0.0-3015-exec-8] INFO c.z.s.cloud.user.controller.RestfulApiController-65- sleep= 5s
  4. * 2024-08-05 20:30:23.498[] user [http-nio-0.0.0.0-3015-exec-7] INFO c.z.s.cloud.user.controller.RestfulApiController-65- sleep= 5s
  5. * 2024-08-05 20:30:31.393[] user [http-nio-0.0.0.0-3015-exec-6] INFO c.z.s.cloud.user.controller.RestfulApiController-65- sleep= 5s
  6. * 2024-08-05 20:30:38.764[] user [http-nio-0.0.0.0-3015-exec-5] INFO c.z.s.cloud.user.controller.RestfulApiController-65- sleep= 7s
  7. * 2024-08-05 20:31:00.140[] user [http-nio-0.0.0.0-3015-exec-1] INFO c.z.s.cloud.user.controller.RestfulApiController-65- sleep= 5s
  8. * 2024-08-05 20:31:07.552[] user [http-nio-0.0.0.0-3015-exec-2] INFO c.z.s.cloud.user.controller.RestfulApiController-65- sleep= 5s
  9. * 2024-08-05 20:31:15.993[] user [http-nio-0.0.0.0-3015-exec-3] INFO c.z.s.cloud.user.controller.RestfulApiController-65- sleep= 5s
  10. * 2024-08-05 20:31:24.517[] user [http-nio-0.0.0.0-3015-exec-4] INFO c.z.s.cloud.user.controller.RestfulApiController-65- sleep= 6s
  11. *

2.设置ribbon.restclient.enabled=true时,有三种情况

* 案例一:provider-user只启动了一个服务
* 设置ribbon.restclient.enabled=true 后;retry还是直接超时,并未重试。而retry2重试了6次 (超时时间是-240- Get connection: {}->http://192.168.1.4:3015, timeout = 2000)
* todo 日志里面总共有6次 “RestClient sending new Request(GET” com.netflix.niws.client.http.RestClient-588- RestClient sending new Request(GET: ) http://192.168.1.4:3015/user/api/v1/retry?name=String
  1. * 案例一:provider-user只启动了一个服务
  2. * 设置ribbon.restclient.enabled=true 后;retry还是直接超时,并未重试。而retry2重试了6次 (超时时间是-240- Get connection: {}->http://192.168.1.4:3015, timeout = 2000)
  3. * todo 日志里面总共有6次 “RestClient sending new Request(GET” com.netflix.niws.client.http.RestClient-588- RestClient sending new Request(GET: ) http://192.168.1.4:3015/user/api/v1/retry?name=String
  4. *
  5. * 2024-08-05 21:04:15.198[] order [http-nio-0.0.0.0-3017-exec-7] DEBUG org.springframework.web.servlet.DispatcherServlet-91- GET "/order/api/v1/retry2?name=String", parameters={masked}
  6. * 2024-08-05 21:04:15.201[] order [http-nio-0.0.0.0-3017-exec-7] DEBUG o.s.w.s.m.m.a.RequestMappingHandlerMapping-412- Mapped to com.zxx.study.cloud.order.controller.RestfulApiController#retry2(String)
  7. * 2024-08-05 21:04:15.206[] order [http-nio-0.0.0.0-3017-exec-7] INFO c.z.s.cloud.order.controller.RestfulApiController-255- name=String
  8. * 2024-08-05 21:04:15.207[] order [http-nio-0.0.0.0-3017-exec-7] DEBUG org.springframework.web.client.RestTemplate-147- HTTP GET http://provider-user/user/api/v1/retry?name=String
  9. * 2024-08-05 21:04:15.209[] order [http-nio-0.0.0.0-3017-exec-7] DEBUG org.springframework.web.client.RestTemplate-147- Accept=[application/json, application/*+json]
  10. * 2024-08-05 21:04:15.210[] order [http-nio-0.0.0.0-3017-exec-7] DEBUG com.netflix.loadbalancer.ZoneAwareLoadBalancer-112- Zone aware logic disabled or there is only one zone
  11. * 2024-08-05 21:04:15.211[] order [http-nio-0.0.0.0-3017-exec-7] DEBUG com.netflix.loadbalancer.LoadBalancerContext-551- using LB returned Server: 192.168.1.4:3015 for request: http://provider-user/user/api/v1/retry?name=String
  12. * 2024-08-05 21:04:15.212[] order [http-nio-0.0.0.0-3017-exec-7] DEBUG com.netflix.niws.client.http.RestClient-588- RestClient sending new Request(GET: ) http://192.168.1.4:3015/user/api/v1/retry?name=String
  13. * 2024-08-05 21:04:15.213[] order [http-nio-0.0.0.0-3017-exec-7] DEBUG com.netflix.http4.MonitoredConnectionManager-240- Get connection: {}->http://192.168.1.4:3015, timeout = 2000
  14. *
  15. * todo provider-user只启动了一个服务
  16. * todo 第一次 5秒超时,后面重试了5次,总共6此u; MaxAutoRetries:3 + MaxAutoRetriesNextServer: 2
  17. * 2024-08-05 21:04:15.227[] user [http-nio-0.0.0.0-3015-exec-2] INFO c.z.s.cloud.user.controller.RestfulApiController-67- uuid= eaa55510-140d-4f5d-bf23-8adf9a620646
  18. * 2024-08-05 21:04:15.228[] user [http-nio-0.0.0.0-3015-exec-2] INFO c.z.s.cloud.user.controller.RestfulApiController-68- sleep= 5s
  19. * 2024-08-05 21:04:18.264[] user [http-nio-0.0.0.0-3015-exec-4] INFO c.z.s.cloud.user.controller.RestfulApiController-67- uuid= ba0e6cc8-4cf6-41fe-91eb-42ec3d2e60d2
  20. * 2024-08-05 21:04:18.265[] user [http-nio-0.0.0.0-3015-exec-4] INFO c.z.s.cloud.user.controller.RestfulApiController-68- sleep= 4s
  21. * 2024-08-05 21:04:21.299[] user [http-nio-0.0.0.0-3015-exec-5] INFO c.z.s.cloud.user.controller.RestfulApiController-67- uuid= ff3fba7e-4798-44b8-a25e-b84e75fb828a
  22. * 2024-08-05 21:04:21.299[] user [http-nio-0.0.0.0-3015-exec-5] INFO c.z.s.cloud.user.controller.RestfulApiController-68- sleep= 6s
  23. * 2024-08-05 21:04:24.335[] user [http-nio-0.0.0.0-3015-exec-6] INFO c.z.s.cloud.user.controller.RestfulApiController-67- uuid= f261909b-d752-42ac-b1f9-47a1747481cc
  24. * 2024-08-05 21:04:24.335[] user [http-nio-0.0.0.0-3015-exec-6] INFO c.z.s.cloud.user.controller.RestfulApiController-68- sleep= 4s
  25. * 2024-08-05 21:04:27.386[] user [http-nio-0.0.0.0-3015-exec-7] INFO c.z.s.cloud.user.controller.RestfulApiController-67- uuid= fcb7d6f7-a997-4629-9901-ebf894758a02
  26. * 2024-08-05 21:04:27.387[] user [http-nio-0.0.0.0-3015-exec-7] INFO c.z.s.cloud.user.controller.RestfulApiController-68- sleep= 6s
  27. * 2024-08-05 21:04:30.409[] user [http-nio-0.0.0.0-3015-exec-8] INFO c.z.s.cloud.user.controller.RestfulApiController-67- uuid= e6f13159-5111-4f0a-babe-3f9b6d8eff61
  28. * 2024-08-05 21:04:30.410[] user [http-nio-0.0.0.0-3015-exec-8] INFO c.z.s.cloud.user.controller.RestfulApiController-68- sleep= 4s
  29. *
* 案例二:provider-user只启动了一个服务
* todo provider-user只启动了一个服务
*设置ribbon.restclient.enabled=true 后;retry还是直接超时,并未重试。而retry2重试了3次 (超时时间是-MonitoredConnectionManager-240- Get connection: {}->http://192.168.1.4:3015, timeout = 5000)
* todo 日志里面总共有3次 “RestClient sending new Request(GET” com.netflix.niws.client.http.RestClient-588- RestClient sending new Request(GET: ) http://192.168.1.4:3015/user/api/v1/retry?name=String
*
  1. * 案例二:provider-user只启动了一个服务
  2. * todo provider-user只启动了一个服务
  3. *设置ribbon.restclient.enabled=true 后;retry还是直接超时,并未重试。而retry2重试了3次 (超时时间是-MonitoredConnectionManager-240- Get connection: {}->http://192.168.1.4:3015, timeout = 5000)
  4. * todo 日志里面总共有3次 “RestClient sending new Request(GET” com.netflix.niws.client.http.RestClient-588- RestClient sending new Request(GET: ) http://192.168.1.4:3015/user/api/v1/retry?name=String
  5. *
  6. * todo retry还是直接超时,并未重试。
  7. * 2024-08-05 21:24:05.636[] order [http-nio-0.0.0.0-3017-exec-4] DEBUG o.a.h.impl.conn.PoolingHttpClientConnectionManager-349- Connection released: [id: 0][route: {}->http://192.168.1.4:3015][total kept alive: 0; route allocated: 0 of 50; total allocated: 0 of 200]
  8. * 2024-08-05 21:24:05.637[] order [http-nio-0.0.0.0-3017-exec-4] DEBUG c.n.loadbalancer.reactive.LoadBalancerCommand-314- Got error java.net.SocketTimeoutException: Read timed out when executed on server 192.168.1.4:3015
  9. * 2024-08-05 21:24:05.643[] order [http-nio-0.0.0.0-3017-exec-4] DEBUG com.zxx.study.cloud.api.user.UserRestfulApiClient-72- [UserRestfulApiClient#retry] <--- ERROR SocketTimeoutException: Read timed out (5085ms)
  10. * 2024-08-05 21:24:05.648[] order [http-nio-0.0.0.0-3017-exec-4] DEBUG com.zxx.study.cloud.api.user.UserRestfulApiClient-72- [UserRestfulApiClient#retry] java.net.SocketTimeoutException: Read timed out
  11. *
  12. * todo 第一次 6秒超时,后面重试了2次,总共3此u; MaxAutoRetries:2 + MaxAutoRetriesNextServer: 1
  13. * 2024-08-05 21:18:22.255[] user [http-nio-0.0.0.0-3015-exec-1] INFO c.z.s.cloud.user.controller.RestfulApiController-67- uuid= dc2edb32-25a6-465c-9577-07b59388670f
  14. * 2024-08-05 21:18:22.256[] user [http-nio-0.0.0.0-3015-exec-1] INFO c.z.s.cloud.user.controller.RestfulApiController-68- sleep= 6s
  15. * 2024-08-05 21:18:27.295[] user [http-nio-0.0.0.0-3015-exec-3] INFO c.z.s.cloud.user.controller.RestfulApiController-67- uuid= 05c593da-6b35-4f7b-8df2-15e9dab7b391
  16. * 2024-08-05 21:18:27.295[] user [http-nio-0.0.0.0-3015-exec-3] INFO c.z.s.cloud.user.controller.RestfulApiController-68- sleep= 6s
  17. * 2024-08-05 21:18:32.318[] user [http-nio-0.0.0.0-3015-exec-2] INFO c.z.s.cloud.user.controller.RestfulApiController-67- uuid= 07fc8c11-5d77-4ffe-a81c-9851f68a647e
  18. * 2024-08-05 21:18:32.318[] user [http-nio-0.0.0.0-3015-exec-2] INFO c.z.s.cloud.user.controller.RestfulApiController-68- sleep= 4s
  19. *
* 案例三:provider-user只启动了2个服务
*
* 总共有5次  com.netflix.niws.client.http.RestClient-588- RestClient sending new Request(GET: )
* 总共5次u;   MaxAutoRetries:2 + MaxAutoRetriesNextServer: 1
*  即 provider-user-4015 第一次5秒超时,而后在provider-user-4015上重试了2次,provider-user-3015上也重试了2次;总共5次。
  1. * 案例三:provider-user只启动了2个服务
  2. *
  3. * 总共有5次 com.netflix.niws.client.http.RestClient-588- RestClient sending new Request(GET: )
  4. * 总共5次u; MaxAutoRetries:2 + MaxAutoRetriesNextServer: 1
  5. * 即 provider-user-4015 第一次5秒超时,而后在provider-user-4015上重试了2次,provider-user-3015上也重试了2次;总共5次。
  6. * provider-user-3015 2次
  7. * 2024-08-05 21:34:14.407[] user [http-nio-0.0.0.0-3015-exec-4] INFO c.z.s.cloud.user.controller.RestfulApiController-67- uuid= 38c1f932-e99f-49c1-889d-aa79af316089
  8. * 2024-08-05 21:34:14.409[] user [http-nio-0.0.0.0-3015-exec-4] INFO c.z.s.cloud.user.controller.RestfulApiController-68- sleep= 6s
  9. * 2024-08-05 21:34:19.456[] user [http-nio-0.0.0.0-3015-exec-5] INFO c.z.s.cloud.user.controller.RestfulApiController-67- uuid= 410f7b33-46ac-4102-a0ff-3c19c18d2b52
  10. * 2024-08-05 21:34:19.457[] user [http-nio-0.0.0.0-3015-exec-5] INFO c.z.s.cloud.user.controller.RestfulApiController-68- sleep= 4s
  11. *
  12. * provider-user-4015 3次
  13. * 2024-08-05 21:33:59.263[] user [http-nio-0.0.0.0-4015-exec-2] INFO c.z.s.cloud.user.controller.RestfulApiController-67- uuid= 97936999-48f0-4257-9fce-7a78081afa4b
  14. * 2024-08-05 21:33:59.264[] user [http-nio-0.0.0.0-4015-exec-2] INFO c.z.s.cloud.user.controller.RestfulApiController-68- sleep= 5s
  15. * 2024-08-05 21:34:04.308[] user [http-nio-0.0.0.0-4015-exec-3] INFO c.z.s.cloud.user.controller.RestfulApiController-67- uuid= ac766122-6a7a-4535-b1fb-928e3a9a5f7f
  16. * 2024-08-05 21:34:04.309[] user [http-nio-0.0.0.0-4015-exec-3] INFO c.z.s.cloud.user.controller.RestfulApiController-68- sleep= 7s
  17. * 2024-08-05 21:34:09.338[] user [http-nio-0.0.0.0-4015-exec-4] INFO c.z.s.cloud.user.controller.RestfulApiController-67- uuid= 02283f9e-fddb-480a-a9d2-68eb3da988ac
  18. * 2024-08-05 21:34:09.339[] user [http-nio-0.0.0.0-4015-exec-4] INFO c.z.s.cloud.user.controller.RestfulApiController-68- sleep= 5s
  19. * */

2.6 小结

1. 慎用 重试机制,GET方法也要慎用,其他方法建议不要用重试机制;OkToRetryOnAllOperations: false即只对Get生效;而true对Post,Put,Delete等均生效。 

2.如果一定要用重试,建议单服务配置,同时确保接口的幂等性。


3.ribbon.restclient.enabled=true控制了重试的开关。

三、FeignLoadBalancer分析

跟踪源码,在FeignLoadBalancer中配置了重试相关的策略,如果ribbon.OkToRetryOnAllOperations配置为true,则任何请求方法都进行重试,ribbon.OkToRetryOnAllOperations配置为false时,GET请求方式也会进行重试,非GET方法只有在连接异常时才会进行重试。

  1. @Override
  2. public RequestSpecificRetryHandler getRequestSpecificRetryHandler (
  3. RibbonRequest request, IClientConfig requestConfig){
  4. // 如果OkToRetryOnAllOperations配置为true,则任何请求方法/任何异常的情况都进行重试
  5. if (this.ribbon.isOkToRetryOnAllOperations()) {
  6. return new RequestSpecificRetryHandler(true, true, this.getRetryHandler(),
  7. requestConfig);
  8. }
  9. // OkToRetryOnAllOperations配置为false时(默认为false)
  10. // 非GET请求,只有连接异常时才进行重试
  11. if (!request.toRequest().method().equals("GET")) {
  12. return new RequestSpecificRetryHandler(true, false, this.getRetryHandler(),
  13. requestConfig);
  14. // GET请求任何情况/任何异常都重试
  15. } else {
  16. return new RequestSpecificRetryHandler(true, true, this.getRetryHandler(),
  17. requestConfig);
  18. }
  19. }

通过上面的分析,我们可以知道并不是配置了ribbon.OkToRetryOnAllOperations=false就不会进行重试,对于GET请求Ribbon还是会进行重试的,而在我们的系统中并没有对Ribbon的重试机制做特殊的配置,也就是用的默认值。

Ribbon重试机制默认配置如下:

  1. #同一实例最大重试次数,不包括首次调用。默认值为0
  2. ribbon.MaxAutoRetries = 0
  3. #同一个服务其他实例的最大重试次数,不包括第一次调用的实例。默认值为1
  4. ribbon.MaxAutoRetriesNextServer = 1
  5. #是否所有操作都允许重试。默认值为false
  6. ribbon.OkToRetryOnAllOperations = false

由于MaxAutoRetriesNextServer配置默认值为1,而我们的导入接口恰巧又是GET请求,在业务服务接口数据处理超时的情况下,所以Ribbon会自动重试一次。

声明:本文内容由网友自发贡献,不代表【wpsshop博客】立场,版权归原作者所有,本站不承担相应法律责任。如您发现有侵权的内容,请联系我们。转载请注明出处:https://www.wpsshop.cn/w/正经夜光杯/article/detail/956808
推荐阅读
相关标签
  

闽ICP备14008679号