Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Crazy HTTP problems after Spring Boot 2.0.1 upgrade

After upgrading to Spring Boot 2.0.1, I am experiencing inexplicable problems with my applications.

There are two applications involved. Requests to App 1 make a back-end call to App 2 to fetch some data. The call to App 2 goes through an AWS elastic load balancer (ELB). The client used to make the call is Apache Commons HttpClient wrapped by my utility.

After App 1 was upgraded to Boot 2.0.1, I am seeing a small minority of the calls from App 1 to App 2 hang for a long time (15 minutes). When I use JConsole to get a thread dump of the hung threads, I see this stack trace:

Stack trace: 
java.net.PlainSocketImpl.socketClose0(Native Method)
java.net.AbstractPlainSocketImpl.socketPreClose(AbstractPlainSocketImpl.java:693)
java.net.AbstractPlainSocketImpl.close(AbstractPlainSocketImpl.java:530)
   - locked java.lang.Object@29a91ad3
java.net.SocksSocketImpl.close(SocksSocketImpl.java:1075)
java.net.Socket.close(Socket.java:1495)
   - locked java.lang.Object@71e238ff
   - locked java.net.Socket@73f58606
sun.security.ssl.BaseSSLSocketImpl.close(BaseSSLSocketImpl.java:624)
   - locked sun.security.ssl.SSLSocketImpl@22264f18
sun.security.ssl.SSLSocketImpl.closeSocket(SSLSocketImpl.java:1585)
sun.security.ssl.SSLSocketImpl.closeInternal(SSLSocketImpl.java:1723)
sun.security.ssl.SSLSocketImpl.recvAlert(SSLSocketImpl.java:2020)
sun.security.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:1135)
   - locked sun.security.ssl.SSLSocketImpl@22264f18
   - locked java.lang.Object@4338a60d
sun.security.ssl.SSLSocketImpl.readDataRecord(SSLSocketImpl.java:940)
sun.security.ssl.AppInputStream.read(AppInputStream.java:105)
   - locked sun.security.ssl.AppInputStream@23fd5b55
org.apache.http.impl.io.SessionInputBufferImpl.streamRead(SessionInputBufferImpl.java:137)
org.apache.http.impl.io.SessionInputBufferImpl.fillBuffer(SessionInputBufferImpl.java:153)
org.apache.http.impl.BHttpConnectionBase.fillInputBuffer(BHttpConnectionBase.java:344)
org.apache.http.impl.BHttpConnectionBase.isStale(BHttpConnectionBase.java:364)
org.apache.http.impl.conn.CPool.validate(CPool.java:71)
org.apache.http.impl.conn.CPool.validate(CPool.java:45)
org.apache.http.pool.AbstractConnPool$2.get(AbstractConnPool.java:249)
   - locked org.apache.http.pool.AbstractConnPool$2@7c672c9a
org.apache.http.pool.AbstractConnPool$2.get(AbstractConnPool.java:193)
org.apache.http.impl.conn.PoolingHttpClientConnectionManager.leaseConnection(PoolingHttpClientConnectionManager.java:282)
org.apache.http.impl.conn.PoolingHttpClientConnectionManager$1.get(PoolingHttpClientConnectionManager.java:269)
org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:191)
org.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:185)
org.apache.http.impl.execchain.RetryExec.execute(RetryExec.java:89)
org.apache.http.impl.execchain.RedirectExec.execute(RedirectExec.java:111)
org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:185)
org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:72)
org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:221)
org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:165)
org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:140)
com.kendelong.util.http.HttpConnectionService.doExecuteAndGetResponse(HttpConnectionService.java:243)
com.kendelong.util.http.HttpConnectionService.getResult(HttpConnectionService.java:189)
com.kendelong.util.http.IHttpConnectionService$getResult$0.call(Unknown Source)
com.hatchbaby.sub.util.MainSiteHttpProxyService.getMemberData(MainSiteHttpProxyService.groovy:68)
com.hatchbaby.sub.util.MainSiteHttpProxyService$$FastClassBySpringCGLIB$$3cc5cf68.invoke(<generated>)
org.springframework.cglib.proxy.MethodProxy.invoke(MethodProxy.java:204)
org.springframework.aop.framework.CglibAopProxy$CglibMethodInvocation.invokeJoinpoint(CglibAopProxy.java:747)
org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:163)
org.springframework.aop.aspectj.MethodInvocationProceedingJoinPoint.proceed(MethodInvocationProceedingJoinPoint.java:89)
com.kendelong.util.performance.PerformanceMonitoringAspect.monitorInvocation(PerformanceMonitoringAspect.java:108)
com.kendelong.util.performance.PerformanceMonitoringAspect.monitorAnnotatedClasses(PerformanceMonitoringAspect.java:83)
sun.reflect.GeneratedMethodAccessor177.invoke(Unknown Source)
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
java.lang.reflect.Method.invoke(Method.java:498)
org.springframework.aop.aspectj.AbstractAspectJAdvice.invokeAdviceMethodWithGivenArgs(AbstractAspectJAdvice.java:644)
org.springframework.aop.aspectj.AbstractAspectJAdvice.invokeAdviceMethod(AbstractAspectJAdvice.java:633)
org.springframework.aop.aspectj.AspectJAroundAdvice.invoke(AspectJAroundAdvice.java:70)
org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:174)
org.springframework.aop.interceptor.ExposeInvocationInterceptor.invoke(ExposeInvocationInterceptor.java:92)
org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:185)
org.springframework.aop.framework.CglibAopProxy$DynamicAdvisedInterceptor.intercept(CglibAopProxy.java:689)
com.hatchbaby.sub.util.MainSiteHttpProxyService$$EnhancerBySpringCGLIB$$edc027d4.getMemberData(<generated>)
com.hatchbaby.sub.service.security.HatchBabyAuthenticationUserDetailsService.loadUserDetails(HatchBabyAuthenticationUserDetailsService.java:39)
com.hatchbaby.sub.service.security.HatchBabyAuthenticationUserDetailsService.loadUserDetails(HatchBabyAuthenticationUserDetailsService.java:20)
org.springframework.security.web.authentication.preauth.PreAuthenticatedAuthenticationProvider.authenticate(PreAuthenticatedAuthenticationProvider.java:103)
org.springframework.security.authentication.ProviderManager.authenticate(ProviderManager.java:174)
org.springframework.security.web.authentication.preauth.AbstractPreAuthenticatedProcessingFilter.doAuthenticate(AbstractPreAuthenticatedProcessingFilter.java:184)
org.springframework.security.web.authentication.preauth.AbstractPreAuthenticatedProcessingFilter.doFilter(AbstractPreAuthenticatedProcessingFilter.java:118)
org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:334)
org.springframework.security.web.context.SecurityContextPersistenceFilter.doFilter(SecurityContextPersistenceFilter.java:105)
org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:334)
org.springframework.security.web.context.request.async.WebAsyncManagerIntegrationFilter.doFilterInternal(WebAsyncManagerIntegrationFilter.java:56)
org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:107)
org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:334)
org.springframework.security.web.FilterChainProxy.doFilterInternal(FilterChainProxy.java:215)
org.springframework.security.web.FilterChainProxy.doFilter(FilterChainProxy.java:178)
org.springframework.web.filter.DelegatingFilterProxy.invokeDelegate(DelegatingFilterProxy.java:357)
org.springframework.web.filter.DelegatingFilterProxy.doFilter(DelegatingFilterProxy.java:270)
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:193)
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:166)
org.springframework.web.filter.RequestContextFilter.doFilterInternal(RequestContextFilter.java:99)
org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:107)
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:193)
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:166)
org.springframework.web.filter.HttpPutFormContentFilter.doFilterInternal(HttpPutFormContentFilter.java:109)
org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:107)
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:193)
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:166)
org.springframework.web.filter.HiddenHttpMethodFilter.doFilterInternal(HiddenHttpMethodFilter.java:81)
org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:107)
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:193)
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:166)
org.springframework.web.filter.AbstractRequestLoggingFilter.doFilterInternal(AbstractRequestLoggingFilter.java:245)
org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:107)
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:193)
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:166)
com.hatchbaby.sub.web.RequestInfoFilter.doFilter(RequestInfoFilter.java:99)
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:193)
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:166)
org.springframework.web.filter.CharacterEncodingFilter.doFilterInternal(CharacterEncodingFilter.java:200)
org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:107)
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:193)
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:166)
com.hatchbaby.sub.web.SpecialRedirectFilter.doFilter(SpecialRedirectFilter.java:43)
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:193)
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:166)
org.ebaysf.web.cors.CORSFilter.handleSimpleCORS(CORSFilter.java:302)
org.ebaysf.web.cors.CORSFilter.doFilter(CORSFilter.java:160)
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:193)
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:166)
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:198)
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:96)
org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:496)
org.apache.catalina.valves.AbstractAccessLogValve.invoke(AbstractAccessLogValve.java:650)
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:140)
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:81)
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:87)
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:342)
org.apache.coyote.http11.Http11Processor.service(Http11Processor.java:803)
org.apache.coyote.AbstractProcessorLight.process(AbstractProcessorLight.java:66)
org.apache.coyote.AbstractProtocol$ConnectionHandler.process(AbstractProtocol.java:790)
org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.doRun(NioEndpoint.java:1459)
org.apache.tomcat.util.net.SocketProcessorBase.run(SocketProcessorBase.java:49)
   - locked org.apache.tomcat.util.net.NioEndpoint$NioSocketWrapper@6ca9f281
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61)
java.lang.Thread.run(Thread.java:748)

The thread hangs in this state for roughly 15 minutes (actally 15 min 44 seconds plus/minus 5 seconds), or a rough integer multiple thereof. Eventually the request succeeds.

What is even odder is that when I compare the logs, I see the request enter App 1 and make the call to App 2 (say, at 8:00). The thread hangs in the socketClose0() method as seen in the thread dump. When the request finally completes 15 minutes later (at 8:15), the timestamps in the logs of App 2 and its ELB show that the request arrived the later time (8:15): it's as if the socket is trying to close for 15 minutes, it finally closes (without having contacted the App 2 server), and immediately retries. Usually it succeeds, but sometimes it gets stuck again for another 15 minutes (hence the multiples of 15 minutes). But the request doesn't even register at App 2's ELB until 15 minutes later (8:15).

I have seen this behavior in my QA environment (one server each) and in my production environment (2 servers each). If I roll the code back to the version with Spring Boot 1.5.9, the problem disappears. However, I have reverted most of the suspect libraries one-by-one to the Boot 1.5.9 version, and the problem persists. I have reverted httpclient (obviously), groovy, tomcat, and commons-codec. I was unable to revert Spring (reset spring.version property) as the app would not start. As part of the upgrade to Boot 2.0.1, I switched from EhCache distributed by JGroups to Hazelcast as the Hibernate second-level cache, but I have completely removed Hazelcast from the app and the problem persists. Reverting Hibernate or Spring Data JPA is more difficult because of code changes.

There are no timeouts set at 15 minutes in my app, of course. Both connection and socket timeouts for httpclient are 5 seconds.

Does anyone have any clue, or any troubleshooting suggestions?

UPDATE

Setting -Djavax.net.debug=all yields this output:

http-nio-8080-exec-6, setSoTimeout(1) called
[Raw read]: length = 5
0000: 15 03 03 00 1A                                     .....
[Raw read]: length = 26
0000: 98 1C 7C C4 27 0B 2B 40   E5 AC D7 38 A9 6F F7 73  ....'[email protected]
0010: 83 18 E8 15 14 7C 52 10   47 5E                    ......R.G^
http-nio-8080-exec-6, READ: TLSv1.2 Alert, length = 26
Padded plaintext after DECRYPTION:  len = 2
0000: 01 00                                              ..
http-nio-8080-exec-6, RECV TLSv1.2 ALERT:  warning, close_notify
http-nio-8080-exec-6, called closeInternal(false)
http-nio-8080-exec-6, SEND TLSv1.2 ALERT:  warning, description = close_notify
Padded plaintext before ENCRYPTION:  len = 2
0000: 01 00                                              ..
http-nio-8080-exec-6, WRITE: TLSv1.2 Alert, length = 26
[Raw write]: length = 31
0000: 15 03 03 00 1A 00 00 00   00 00 00 00 02 1F A3 99  ................
0010: 4C CD 61 EB 02 2C 14 0D   00 27 03 51 05 F7 1F     L.a..,...'.Q...
http-nio-8080-exec-6, called closeSocket(false)

At this point everything hangs. Looking with netstat on the client, I see the socket is in LAST_ACK state, meaning it's waiting for a final ACK packet.

I removed App 2's ELB from the mix and connected directly to the server for App 2. The problem persists. When client connections are in LAST_ACK state, there are no corresponding connections showing up in netstat on the server side. If the server had not sent the final ACK, it would be in FIN_WAIT_2, but since the connection is already gone, it must have sent the final ACK but it never arrived at the client.

How could a Spring Boot upgrade lead to blocked ACK packets on socket close operations?

All of this is happening on Amazon Linux 2017.9.

UPDATE 2

Five months later, same thing. I upgraded App2 to Spring Boot 2.0.5, no problems.

App1, when upgraded to 2.0.5, starts hanging it's threads. I turned on HttpClient wire logging, and what I see just where it's about to make the network call, there is no outgoing data on the wire logger, and 15m33s later I get an "end of stream" coming in and the request finally closes.

like image 214
Ken DeLong Avatar asked Nov 07 '22 07:11

Ken DeLong


1 Answers

Halleuja, I figure it out. My HTTP library was setting socketLinger = socketTimeout + 500. However, socketLinger and socketTimeout are quite different! My socketTimeout is set to 5000 ms. I released version 1.7.0 which stopped setting socketLinger and presto the problem went away.

I still don't know why I never saw problems with this before, or why the Spring Boot 2 upgrade caused the bug to manifest, but at least it's gone now!

like image 139
Ken DeLong Avatar answered Nov 15 '22 12:11

Ken DeLong