Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Netflix Ribbon and Hystrix Timeout

We are using Spring cloud in our project. We have several micro services and each has its own .yml file.

Below properies are only in zuul server

hystrix.command.default.execution.isolation.thread.timeoutInMilliseconds: 60000

    ribbon: 
     ConnectTimeout: 3000
     ReadTimeout: 60000

Test 1:

Accounts Service:

This service is what I'm calling to test the timeout and I'm calling the request through zuul i.e., using the port 8006.

@RequestMapping(value = "/accountholders/{cardHolderId}/accounts", produces = "application/json; charset=utf-8", method = RequestMethod.GET)
    @ResponseBody
    public AllAccountsVO getAccounts(@PathVariable("cardHolderId") final String cardHolderId,
            @RequestHeader("userContextId") final String userContextId,
            @RequestParam final MultiValueMap<String, String> allRequestParams, final HttpServletRequest request) {

        return iAccountService.getCardHolderAccountsInfo(cardHolderId, userContextId, request, allRequestParams,
                ApplicationConstants.ACCOUNTHOLDER);
    }

The above service internally calls the below one using Spring RestTemplate. I started testing by adding a sleep time of 5000ms like below in Association Service and made a request to Accounts Service (getAccounts call).

Association Service:

@RequestMapping(value = "/internal/userassociationstatus", produces = "application/json; charset=utf-8", consumes = "application/json", method = RequestMethod.GET)
    @ResponseBody
    public UserAssociationStatusVO getUserAssociationStatus(@RequestParam final Map<String, String> allRequestParams) {
        try {
            Thread.sleep(5000);
        } catch (InterruptedException e) {
            // TODO Auto-generated catch block
            e.printStackTrace();
        }
        return iUserAssociationsService.getUserAssociationStatus(allRequestParams);
    }

Below is the error I got in Association Service

org.apache.catalina.connector.ClientAbortException: java.io.IOException: An established connection was aborted by the software in your host machine
at org.apache.catalina.connector.OutputBuffer.realWriteBytes(OutputBuffer.java:393) ~[tomcat-embed-core-8.0.30.jar:8.0.30]
at org.apache.tomcat.util.buf.ByteChunk.flushBuffer(ByteChunk.java:426) ~[tomcat-embed-core-8.0.30.jar:8.0.30]
at org.apache.catalina.connector.OutputBuffer.doFlush(OutputBuffer.java:342) ~[tomcat-embed-core-8.0.30.jar:8.0.30]

Below is the error I got in Accounts Service

org.springframework.web.client.ResourceAccessException: I/O error on GET request for "http://USERASSOCIATIONS-V1/user/v1/internal/userassociationstatus?cardholderid=123&usercontextid=222&role=ACCOUNT": com.sun.jersey.api.client.ClientHandlerException: java.net.SocketTimeoutException: Read timed out; nested exception is java.io.IOException: com.sun.jersey.api.client.ClientHandlerException: java.net.SocketTimeoutException: Read timed out
    at org.springframework.web.client.RestTemplate.doExecute(RestTemplate.java:607) ~[spring-web-4.2.4.RELEASE.jar:4.2.4.RELEASE]
    at org.springframework.web.client.RestTemplate.execute(RestTemplate.java:557) ~[spring-web-4.2.4.RELEASE.jar:4.2.4.RELEASE]
    at org.springframework.web.client.RestTemplate.exchange(RestTemplate.java:475) ~[spring-web-4.2.4.RELEASE.jar:4.2.4.RELEASE]

If I keep the sleep time as 4500 it gives me response, but if is >=4800 it throws the above exception. I'm thinking this is not related to Ribbon Timeouts but something else. Any specific reason for the above exception after certain point.

Test 2

Then I tried keeping a sleep time of 75000 ms in Accounts Service directly and removed sleep time Association Service.

@RequestMapping(value = "/accountholders/{cardHolderId}/accounts", produces = "application/json; charset=utf-8", method = RequestMethod.GET)
    @ResponseBody
    public AllAccountsVO getAccounts(@PathVariable("cardHolderId") final String cardHolderId,
            @RequestHeader("userContextId") final String userContextId,
            @RequestParam final MultiValueMap<String, String> allRequestParams, final HttpServletRequest request) {

        try {
            Thread.sleep(75000);
        } catch (InterruptedException ex) {
            // TODO Auto-generated catch block
            ex.printStackTrace();
        }
        return iAccountService.getCardHolderAccountsInfo(cardHolderId, userContextId, request, allRequestParams,
                ApplicationConstants.ACCOUNTHOLDER);
    }

In this case I got "exception": "com.netflix.zuul.exception.ZuulException",

And in my APIGateway(Zuul application) log I see the below error.

com.netflix.zuul.exception.ZuulException: Forwarding error
    at org.springframework.cloud.netflix.zuul.filters.route.RibbonRoutingFilter.forward(RibbonRoutingFilter.java:134) ~[spring-cloud-netflix-core-1.1.0.M5.jar:1.1.0.M5]
    at org.springframework.cloud.netflix.zuul.filters.route.RibbonRoutingFilter.run(RibbonRoutingFilter.java:76) ~[spring-cloud-netflix-core-1.1.0.M5.jar:1.1.0.M5]
    at com.netflix.zuul.ZuulFilter.runFilter(ZuulFilter.java:112) ~[zuul-core-1.1.0.jar:1.1.0]
    at com.netflix.zuul.FilterProcessor.processZuulFilter(FilterProcessor.java:197) ~[zuul-core-1.1.0.jar:1.1.0]


Caused by: com.netflix.hystrix.exception.HystrixRuntimeException: useraccounts-v1RibbonCommand timed-out and no fallback available.
    at com.netflix.hystrix.AbstractCommand$16.call(AbstractCommand.java:806) ~[hystrix-core-1.4.23.jar:1.4.23]
    at com.netflix.hystrix.AbstractCommand$16.call(AbstractCommand.java:790) ~[hystrix-core-1.4.23.jar:1.4.23]
    at rx.internal.operators.OperatorOnErrorResumeNextViaFunction$1.onError(OperatorOnErrorResumeNextViaFunction.java:99) ~[rxjava-1.0.14.jar:1.0.14]
    at rx.internal.operators.OperatorDoOnEach$1.onError(OperatorDoOnEach.java:70) ~[rxjava-1.0.14.jar:1.0.14]

I think this has nothing to do with Ribbon ConnectTimeout or ReadTimeout. This error is because of the property "execution.isolation.thread.timeoutInMilliseconds: 60000". I have also reduced this property to 10000 ms to test the behavior and got the same exception if the sleep time is more(ex: 12000).

I want to understand Ribbon ConnectTimeout and Read-timeout vs Hystrix timeout and how to test ribbon timeouts in my application. Also if I want different timeouts for different microservices, Do I keep these properties in respective .yml files?. Any thoughts?

I'm trying to create a document to be used by my team so that it is easy for a developer to know how these timeout options work in Spring cloud.

(It's a lengthy description but to make it clearer I had to write in detail)

like image 836
Arun Avatar asked Aug 25 '16 20:08

Arun


People also ask

What is the default timeout provided by Hystrix?

After quick investigation we discovered that Hystrix has a default timeout set to 500ms and apparently that wasn't enough for Garage service during holiday peaks.

How do I set hystrix timeout?

timeout=10000 , setting the default Hystrix command timeout to 10 seconds. It is even simpler with environment variables - thanks to relaxed binding, any of these will work ( export examples are for Linux): export service. timeout=10000.

Is Netflix hystrix deprecated?

Spring Cloud Hystrix project is deprecated. So new applications should not use this project.

What is Netflix Hystrix?

Hystrix is a library designed to control the interactions between these distributed services providing greater tolerance of latency and failure.


1 Answers

The connectTimeout and readTimeout in ribbon are passed down to the underlying HTTP client. They apply to the HTTP connection (not the HTTP request once the connection has been established). I'm not sure why you'd need to test it like this really, but it's going to be hard with a healthy server. For instance, for connectTimeout, you need one that can accept TCP connections but not finish the HTTP layer connection. For readTimeout you need one that makes a connection but then doesn't send any data (at all).

like image 53
Dave Syer Avatar answered Oct 21 '22 17:10

Dave Syer