Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

ZuulProxy fails with "RibbonCommand timed-out and no fallback available" when it should do failover

Tags:

spring-cloud

Short description: I'm trying to get a ZuulProxy to handle instance failover but it throws ZuulException: Forwarding error, instead of responding with a result from a working instance.

Long description: My setup is one standalone Eureka Server, one ConfigServer, one ZuulProxy (@EnableZuulProxy) and two service instances, all of them registred in Eureka.

Everything is running with spring-cloud-starter-parent Angel.SR3

My servicediscovery:

@SpringBootApplication
@EnableEurekaServer
public class EurekaServer {
    public static void main(String[] args) {
        SpringApplication.run(EurekaServer.class, args);
    }
}

My ConfigServer:

@SpringBootApplication
@EnableAutoConfiguration
@EnableConfigServer
@ComponentScan
@EnableDiscoveryClient
public class ConfigserverApplication {

  public static void main(String[] args) {
    SpringApplication.run(ConfigserverApplication.class, args);
  }
}

My ZuulProxy:

@SpringBootApplication
@EnableAutoConfiguration
@ComponentScan
@EnableDiscoveryClient
@EnableZuulProxy
public class ZuulProxy {
  public static void main(String[] args) {
    SpringApplication.run(ZuulProxy.class, args);
  }
}

The routing-rules in zuul:

zuul.ignoredServices=*
zuul.routes.examplems=/example/**

My service instances:

@SpringBootApplication
@Configuration
@EnableAutoConfiguration
@EnableDiscoveryClient
@ComponentScan(basePackages = "se.example.microservices")
@EnableSwagger
public class Application {

  public static void main(String[] args) throws Exception{
    SpringApplication.run(Application.class, args);
  }

My service instances register themselves using spring.application.name=examplems

Everything works fine when I fire up two service instances and shoot requests through the zuulproxy, it round robins the requests to my two service instances. But when I stop one of the instances Zuul still tries to forward the request to the stopped instance several times and then it fails with:

com.netflix.zuul.exception.ZuulException: Forwarding error
Caused by: com.netflix.hystrix.exception.HystrixRuntimeException: examplemsRibbonCommand timed-out and no fallback available.

I was expecting the request to the stopped instance to time out and fail over to a running instance transparently. The really strange thing is that it seems like zuul (according to logs) first tries with the stopped instance several times (which is of course failing) and then it forwards the request to the working instance, gets an OK answer but instead of forwarding the OK answer to the client it throws an exception resulting in a failed response with status 500!?

Please see the logs (my working instance is on host PMD11286 and my stopped instance is on PMD11933):

2015-08-20 08:45:46.343  INFO 7700 --- [nio-9050-exec-9] o.s.c.n.zuul.filters.ProxyRouteLocator   : Finding route for path: /example/ping/delay
2015-08-20 08:45:46.343 DEBUG 7700 --- [nio-9050-exec-9] o.a.h.impl.conn.DefaultClientConnection  : Connection 0.0.0.0:50251<->172.20.120.39:9060 closed
2015-08-20 08:45:46.343 DEBUG 7700 --- [nio-9050-exec-9] .a.h.i.c.DefaultClientConnectionOperator : Connecting to PMD11933:9060
2015-08-20 08:45:47.372 DEBUG 7700 --- [nio-9050-exec-9] .a.h.i.c.DefaultClientConnectionOperator : Connect to PMD11933:9060 timed out. Connection will be retried using another IP address
2015-08-20 08:45:47.372 DEBUG 7700 --- [nio-9050-exec-9] .a.h.i.c.DefaultClientConnectionOperator : Connecting to PMD11933:9060
2015-08-20 08:45:48.386 DEBUG 7700 --- [nio-9050-exec-9] .a.h.i.c.DefaultClientConnectionOperator : Connect to PMD11933:9060 timed out. Connection will be retried using another IP address
2015-08-20 08:45:48.386 DEBUG 7700 --- [nio-9050-exec-9] .a.h.i.c.DefaultClientConnectionOperator : Connecting to PMD11933:9060
2015-08-20 08:45:49.416 DEBUG 7700 --- [nio-9050-exec-9] .a.h.i.c.DefaultClientConnectionOperator : Connect to PMD11933:9060 timed out. Connection will be retried using another IP address
2015-08-20 08:45:49.416 DEBUG 7700 --- [nio-9050-exec-9] .a.h.i.c.DefaultClientConnectionOperator : Connecting to PMD11933:9060
2015-08-20 08:45:50.430 DEBUG 7700 --- [N_MANAGER_TIMER] o.a.h.i.c.t.ThreadSafeClientConnManager  : Closing expired connections
2015-08-20 08:45:50.430 DEBUG 7700 --- [N_MANAGER_TIMER] o.a.h.impl.conn.tsccm.ConnPoolByRoute    : Closing expired connections
2015-08-20 08:45:50.446 DEBUG 7700 --- [nio-9050-exec-9] .a.h.i.c.DefaultClientConnectionOperator : Connect to PMD11933:9060 timed out. Connection will be retried using another IP address
2015-08-20 08:45:50.446 DEBUG 7700 --- [nio-9050-exec-9] .a.h.i.c.DefaultClientConnectionOperator : Connecting to PMD11933:9060
2015-08-20 08:45:51.475 DEBUG 7700 --- [nio-9050-exec-9] .a.h.i.c.DefaultClientConnectionOperator : Connect to PMD11933:9060 timed out. Connection will be retried using another IP address
2015-08-20 08:45:51.475 DEBUG 7700 --- [nio-9050-exec-9] .a.h.i.c.DefaultClientConnectionOperator : Connecting to PMD11933:9060
2015-08-20 08:45:52.505 DEBUG 7700 --- [nio-9050-exec-9] o.a.h.impl.conn.DefaultClientConnection  : Connection 0.0.0.0:50251<->172.20.120.39:9060 closed
2015-08-20 08:45:52.505 DEBUG 7700 --- [nio-9050-exec-9] o.a.h.impl.conn.DefaultClientConnection  : Connection 0.0.0.0:50251<->172.20.120.39:9060 shut down
2015-08-20 08:45:52.505 DEBUG 7700 --- [nio-9050-exec-9] o.a.h.impl.conn.DefaultClientConnection  : Connection 0.0.0.0:50251<->172.20.120.39:9060 closed
2015-08-20 08:45:52.520 DEBUG 7700 --- [nio-9050-exec-9] o.a.h.client.protocol.RequestAddCookies  : CookieSpec selected: ignoreCookies
2015-08-20 08:45:52.520 DEBUG 7700 --- [nio-9050-exec-9] o.a.h.client.protocol.RequestAuthCache   : Auth cache not set in the context
2015-08-20 08:45:52.520 DEBUG 7700 --- [nio-9050-exec-9] o.a.h.c.p.RequestTargetAuthentication    : Target auth state: UNCHALLENGED
2015-08-20 08:45:52.520 DEBUG 7700 --- [nio-9050-exec-9] o.a.h.c.p.RequestProxyAuthentication     : Proxy auth state: UNCHALLENGED
2015-08-20 08:45:52.520 DEBUG 7700 --- [nio-9050-exec-9] o.a.h.impl.conn.DefaultClientConnection  : Sending request: GET /ping/delay HTTP/1.1
2015-08-20 08:45:52.520 DEBUG 7700 --- [nio-9050-exec-9] org.apache.http.wire                     :  >> "GET /ping/delay HTTP/1.1[\r][\n]"
2015-08-20 08:45:52.520 DEBUG 7700 --- [nio-9050-exec-9] org.apache.http.wire                     :  >> "x-forwarded-host: 127.0.0.1:9050[\r][\n]"
2015-08-20 08:45:52.520 DEBUG 7700 --- [nio-9050-exec-9] org.apache.http.wire                     :  >> "x-forwarded-prefix: /example[\r][\n]"
2015-08-20 08:45:52.520 DEBUG 7700 --- [nio-9050-exec-9] org.apache.http.wire                     :  >> "accept-encoding: deflate, gzip[\r][\n]"
2015-08-20 08:45:52.520 DEBUG 7700 --- [nio-9050-exec-9] org.apache.http.wire                     :  >> "user-agent: curl/7.42.1[\r][\n]"
2015-08-20 08:45:52.520 DEBUG 7700 --- [nio-9050-exec-9] org.apache.http.wire                     :  >> "accept: */*[\r][\n]"
2015-08-20 08:45:52.520 DEBUG 7700 --- [nio-9050-exec-9] org.apache.http.wire                     :  >> "Netflix.NFHttpClient.Version: 1.0[\r][\n]"
2015-08-20 08:45:52.520 DEBUG 7700 --- [nio-9050-exec-9] org.apache.http.wire                     :  >> "X-netflix-httpclientname: examplems[\r][\n]"
2015-08-20 08:45:52.520 DEBUG 7700 --- [nio-9050-exec-9] org.apache.http.wire                     :  >> "Host: PMD11286:9060[\r][\n]"
2015-08-20 08:45:52.520 DEBUG 7700 --- [nio-9050-exec-9] org.apache.http.wire                     :  >> "Connection: Keep-Alive[\r][\n]"
2015-08-20 08:45:52.520 DEBUG 7700 --- [nio-9050-exec-9] org.apache.http.wire                     :  >> "[\r][\n]"
2015-08-20 08:45:52.520 DEBUG 7700 --- [nio-9050-exec-9] org.apache.http.headers                  : >> GET /ping/delay HTTP/1.1
2015-08-20 08:45:52.520 DEBUG 7700 --- [nio-9050-exec-9] org.apache.http.headers                  : >> x-forwarded-host: 127.0.0.1:9050
2015-08-20 08:45:52.520 DEBUG 7700 --- [nio-9050-exec-9] org.apache.http.headers                  : >> x-forwarded-prefix: /example
2015-08-20 08:45:52.520 DEBUG 7700 --- [nio-9050-exec-9] org.apache.http.headers                  : >> accept-encoding: deflate, gzip
2015-08-20 08:45:52.520 DEBUG 7700 --- [nio-9050-exec-9] org.apache.http.headers                  : >> user-agent: curl/7.42.1
2015-08-20 08:45:52.520 DEBUG 7700 --- [nio-9050-exec-9] org.apache.http.headers                  : >> accept: */*
2015-08-20 08:45:52.520 DEBUG 7700 --- [nio-9050-exec-9] org.apache.http.headers                  : >> Netflix.NFHttpClient.Version: 1.0
2015-08-20 08:45:52.520 DEBUG 7700 --- [nio-9050-exec-9] org.apache.http.headers                  : >> X-netflix-httpclientname: examplems
2015-08-20 08:45:52.520 DEBUG 7700 --- [nio-9050-exec-9] org.apache.http.headers                  : >> Host: PMD11286:9060
2015-08-20 08:45:52.520 DEBUG 7700 --- [nio-9050-exec-9] org.apache.http.headers                  : >> Connection: Keep-Alive
2015-08-20 08:45:52.630 DEBUG 7700 --- [nio-9050-exec-9] org.apache.http.wire                     :  << "HTTP/1.1 200 OK[\r][\n]"
2015-08-20 08:45:52.630 DEBUG 7700 --- [nio-9050-exec-9] org.apache.http.wire                     :  << "Server: Apache-Coyote/1.1[\r][\n]"
2015-08-20 08:45:52.630 DEBUG 7700 --- [nio-9050-exec-9] org.apache.http.wire                     :  << "X-Application-Context: examplems:9060[\r][\n]"
2015-08-20 08:45:52.630 DEBUG 7700 --- [nio-9050-exec-9] org.apache.http.wire                     :  << "Content-Type: text/plain;charset=UTF-8[\r][\n]"
2015-08-20 08:45:52.630 DEBUG 7700 --- [nio-9050-exec-9] org.apache.http.wire                     :  << "Content-Length: 76[\r][\n]"
2015-08-20 08:45:52.630 DEBUG 7700 --- [nio-9050-exec-9] org.apache.http.wire                     :  << "Date: Thu, 20 Aug 2015 06:45:52 GMT[\r][\n]"
2015-08-20 08:45:52.630 DEBUG 7700 --- [nio-9050-exec-9] org.apache.http.wire                     :  << "[\r][\n]"
2015-08-20 08:45:52.630 DEBUG 7700 --- [nio-9050-exec-9] o.a.h.impl.conn.DefaultClientConnection  : Receiving response: HTTP/1.1 200 OK
2015-08-20 08:45:52.630 DEBUG 7700 --- [nio-9050-exec-9] org.apache.http.headers                  : << HTTP/1.1 200 OK
2015-08-20 08:45:52.630 DEBUG 7700 --- [nio-9050-exec-9] org.apache.http.headers                  : << Server: Apache-Coyote/1.1
2015-08-20 08:45:52.630 DEBUG 7700 --- [nio-9050-exec-9] org.apache.http.headers                  : << X-Application-Context: examplems:9060
2015-08-20 08:45:52.630 DEBUG 7700 --- [nio-9050-exec-9] org.apache.http.headers                  : << Content-Type: text/plain;charset=UTF-8
2015-08-20 08:45:52.630 DEBUG 7700 --- [nio-9050-exec-9] org.apache.http.headers                  : << Content-Length: 76
2015-08-20 08:45:52.630 DEBUG 7700 --- [nio-9050-exec-9] org.apache.http.headers                  : << Date: Thu, 20 Aug 2015 06:45:52 GMT
2015-08-20 08:45:52.630 DEBUG 7700 --- [nio-9050-exec-9] org.apache.http.wire                     :  << "Svar efter 100 ms v[0xc3][0xa4]ntan. Kan [0xc3][0xa4]ndras med: ?time=200  15-08-20 08:45:52,618"
2015-08-20 08:45:52.630  WARN 7700 --- [nio-9050-exec-9] o.s.c.n.z.filters.post.SendErrorFilter   : Error during filtering

com.netflix.zuul.exception.ZuulException: Forwarding error
        at org.springframework.cloud.netflix.zuul.filters.route.RibbonRoutingFilter.forward(RibbonRoutingFilter.java:142)
        at org.springframework.cloud.netflix.zuul.filters.route.RibbonRoutingFilter.run(RibbonRoutingFilter.java:107)
        at com.netflix.zuul.ZuulFilter.runFilter(ZuulFilter.java:112)
        at com.netflix.zuul.FilterProcessor.processZuulFilter(FilterProcessor.java:197)
        at com.netflix.zuul.FilterProcessor.runFilters(FilterProcessor.java:161)
        at com.netflix.zuul.FilterProcessor.route(FilterProcessor.java:120)
        at com.netflix.zuul.ZuulRunner.route(ZuulRunner.java:84)
        at com.netflix.zuul.http.ZuulServlet.route(ZuulServlet.java:111)
        at com.netflix.zuul.http.ZuulServlet.service(ZuulServlet.java:77)
        at org.springframework.web.servlet.mvc.ServletWrappingController.handleRequestInternal(ServletWrappingController.java:158)
        at org.springframework.cloud.netflix.zuul.web.ZuulController.handleRequestInternal(ZuulController.java:43)
        at org.springframework.web.servlet.mvc.AbstractController.handleRequest(AbstractController.java:146)
        at org.springframework.web.servlet.mvc.SimpleControllerHandlerAdapter.handle(SimpleControllerHandlerAdapter.java:50)
        at org.springframework.web.servlet.DispatcherServlet.doDispatch(DispatcherServlet.java:959)
        at org.springframework.web.servlet.DispatcherServlet.doService(DispatcherServlet.java:893)
        at org.springframework.web.servlet.FrameworkServlet.processRequest(FrameworkServlet.java:966)
        at org.springframework.web.servlet.FrameworkServlet.doGet(FrameworkServlet.java:857)
        at javax.servlet.http.HttpServlet.service(HttpServlet.java:618)
        at org.springframework.web.servlet.FrameworkServlet.service(FrameworkServlet.java:842)
        at javax.servlet.http.HttpServlet.service(HttpServlet.java:725)
        at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:291)
        at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
        at org.apache.tomcat.websocket.server.WsFilter.doFilter(WsFilter.java:52)
        at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:239)
        at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
        at org.springframework.boot.actuate.autoconfigure.EndpointWebMvcAutoConfiguration$ApplicationContextHeaderFilter.doFilterInternal(EndpointWebMvcAutoConfiguration.java:291)
        at org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:107)
        at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:239)
        at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
        at org.springframework.web.filter.HiddenHttpMethodFilter.doFilterInternal(HiddenHttpMethodFilter.java:77)
        at org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:107)
        at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:239)
        at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
        at org.springframework.boot.actuate.trace.WebRequestTraceFilter.doFilterInternal(WebRequestTraceFilter.java:102)
        at org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:107)
        at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:239)
        at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
        at org.springframework.web.filter.CharacterEncodingFilter.doFilterInternal(CharacterEncodingFilter.java:85)
        at org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:107)
        at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:239)
        at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
        at org.springframework.boot.actuate.autoconfigure.MetricFilterAutoConfiguration$MetricsFilter.doFilterInternal(MetricFilterAutoConfiguration.java:90)
        at org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:107)
        at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:239)
        at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
        at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:219)
        at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:106)
        at org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:501)
        at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:142)
        at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:79)
        at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:88)
        at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:516)
        at org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:1086)
        at org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:659)
        at org.apache.coyote.http11.Http11NioProtocol$Http11ConnectionHandler.process(Http11NioProtocol.java:223)
        at org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.doRun(NioEndpoint.java:1558)
        at org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.run(NioEndpoint.java:1515)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61)
        at java.lang.Thread.run(Thread.java:745)
Caused by: com.netflix.hystrix.exception.HystrixRuntimeException: examplemsRibbonCommand timed-out and no fallback available.
        at com.netflix.hystrix.AbstractCommand$16.call(AbstractCommand.java:782)
        at com.netflix.hystrix.AbstractCommand$16.call(AbstractCommand.java:769)
        at rx.internal.operators.OperatorOnErrorResumeNextViaFunction$1.onError(OperatorOnErrorResumeNextViaFunction.java:77)
        at rx.internal.operators.OperatorDoOnEach$1.onError(OperatorDoOnEach.java:70)
        at rx.internal.operators.OperatorDoOnEach$1.onError(OperatorDoOnEach.java:70)
        at rx.internal.operators.OperatorDoOnEach$1.onError(OperatorDoOnEach.java:70)
        at com.netflix.hystrix.AbstractCommand$DeprecatedOnFallbackHookApplication$1.onError(AbstractCommand.java:1444)
        at com.netflix.hystrix.AbstractCommand$FallbackHookApplication$1.onError(AbstractCommand.java:1334)
        at com.netflix.hystrix.HystrixCommand$2.call(HystrixCommand.java:314)
        at com.netflix.hystrix.HystrixCommand$2.call(HystrixCommand.java:306)
        at rx.Observable$1.call(Observable.java:144)
        at rx.Observable$1.call(Observable.java:136)
        at rx.Observable$1.call(Observable.java:144)
        at rx.Observable$1.call(Observable.java:136)
        at rx.Observable$1.call(Observable.java:144)
        at rx.Observable$1.call(Observable.java:136)
        at rx.Observable$1.call(Observable.java:144)
        at rx.Observable$1.call(Observable.java:136)
        at rx.Observable$1.call(Observable.java:144)
        at rx.Observable$1.call(Observable.java:136)
        at rx.Observable$1.call(Observable.java:144)
        at rx.Observable$1.call(Observable.java:136)
        at rx.Observable$1.call(Observable.java:144)
        at rx.Observable$1.call(Observable.java:136)
        at rx.Observable$1.call(Observable.java:144)
        at rx.Observable$1.call(Observable.java:136)
        at rx.Observable$1.call(Observable.java:144)
        at rx.Observable$1.call(Observable.java:136)
        at rx.Observable$1.call(Observable.java:144)
        at rx.Observable$1.call(Observable.java:136)
        at rx.Observable.unsafeSubscribe(Observable.java:7466)
        at rx.internal.operators.OperatorOnErrorResumeNextViaFunction$1.onError(OperatorOnErrorResumeNextViaFunction.java:78)
        at rx.internal.operators.OperatorDoOnEach$1.onError(OperatorDoOnEach.java:70)
        at rx.internal.operators.OperatorDoOnEach$1.onError(OperatorDoOnEach.java:70)
        at com.netflix.hystrix.AbstractCommand$HystrixObservableTimeoutOperator$1.run(AbstractCommand.java:923)
        at com.netflix.hystrix.strategy.concurrency.HystrixContextRunnable$1.call(HystrixContextRunnable.java:41)
        at com.netflix.hystrix.strategy.concurrency.HystrixContextRunnable$1.call(HystrixContextRunnable.java:37)
        at com.netflix.hystrix.strategy.concurrency.HystrixContextRunnable.run(HystrixContextRunnable.java:57)
        at com.netflix.hystrix.AbstractCommand$HystrixObservableTimeoutOperator$2.tick(AbstractCommand.java:943)
        at com.netflix.hystrix.util.HystrixTimer$1.run(HystrixTimer.java:98)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        ... 1 common frames omitted
Caused by: java.util.concurrent.TimeoutException: null
        at com.netflix.hystrix.AbstractCommand$9.call(AbstractCommand.java:589)
        at com.netflix.hystrix.AbstractCommand$9.call(AbstractCommand.java:570)
        at rx.internal.operators.OperatorOnErrorResumeNextViaFunction$1.onError(OperatorOnErrorResumeNextViaFunction.java:77)
        ... 15 common frames omitted

If I wait a minute or two the service is removed from eureka and eventually this gets propagated to the zuulproxy, which means no more traffic to the stopped service. But my assumption was that ribbon/hysterix would have handled a non responding service more elegant/rapidly.

Any hint/suggestion? Thx Magnus

like image 757
Magnus Avatar asked Aug 20 '15 08:08

Magnus


1 Answers

1/ TIMEOUT

Zuul requests are monitored by Hystrix whose purpose (in that application) is to apply timeouts on long running requests.

Hystrix provides two different ways to execute commands and enforce timeouts: SEMAPHORE and THREAD execution isolation.

When THREAD isolation is used, Hystrix commands are executed on a separate thread from a thread pool. Hystrix then "pauses" the thread holding the incoming request until a response is received from the down stream server or a timeout occurs.

When SEMAPHORE isolation is used, Hystrix commands are executed on the request thread. Timeouts are detected only after a response is received from the down stream server. So if you configure Zuul/Hystrix with a timeout of 5s and your service takes 30s to complete, your client will be notified of the timeout only after 30s - even if the service responded successfully (!)

Netflix recommends THREAD execution by default except in some rare cases. Unfortunately, the SpringCloud Zuul integration changed it to SEMAPHORE for reasons unknown to me. See Why is ZUUL forcing a SEMAPHORE isolation to execute its Hystrix commands? for more information.

This explains why you receive a 500 error although the remaining live server was successfully contacted.

2/ RETRY

Ribbon is used to make the actual call to remote service. It uses information provided by Eureka to determine the available services and the corresponding addresses. Eureka uses a local cache that is updated every 30 seconds. So as @spencergibb said, it is likely to hold obsolete information for a while (dead server) - but this is expected.

Ribbon automatically retries when it fails to connect/contact a service. It can be configured to retry the same server a couple of time before trying another. I don't remember the default values nor the actual configuration property, but personally I have been using the following settings:

# Max number of retries on the same server (excluding the first try)
ribbon.maxAutoRetries = 1

# Max number of next servers to retry (excluding the first server)
ribbon.MaxAutoRetriesNextServer = 2

3/ CONNECT TIMEOUT

From your logs it appears it takes about 1s to fail the connect attempt to the remote service. This very long for a stopped service. Attempts to connect to a TCP port with no service listening should fail immediately (at least if the host/ip is reachable and the connect attempt doesn't end in the void)...

The connect timeout is controlled by the following property - make sure you set it to a descent value:

# Connect timeout used by Apache HttpClient
ribbon.ConnectTimeout=3000

# Read timeout used by Apache HttpClient
ribbon.ReadTimeout=5000

Hope this information helps you to troubleshoot your problem ;-)

like image 181
Bertrand Renuart Avatar answered Oct 26 '22 15:10

Bertrand Renuart