Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Spring Boot random "SSLException: Connection reset" in Kubernetes with JDK11

Context:

  • We have a Spring Boot (2.3.1.RELEASE) web app
  • It's written in Java 8 but running inside of a container with Java 11 (openjdk:11.0.6-jre-stretch).
  • It has a DB connection and an upstream service that is called via https (simple RestTemplate#exchange method) (this is important!)
  • It is deployed inside of a Kubernetes cluster (not sure if this is important)

Problem:

  • Every day, I see a small percentage of requests towards the upstream service fail with this error: I/O error on GET request for "https://upstream.xyz/path": Connection reset; nested exception is javax.net.ssl.SSLException: Connection reset
  • The errors are totally random and happen intermittently.
  • We have had a similar error (javax.net.ssl.SSLProtocolException: Connection reset) that was related to JRE11 and it's TLS 1.3 negotiation issue. We have updated our Docker image to above mentioned and that fixed it.
  • This is the stack trace from the error:
java.net.SocketException: Connection reset
    at java.base/java.net.SocketInputStream.read(Unknown Source)
    at java.base/java.net.SocketInputStream.read(Unknown Source)
    at java.base/sun.security.ssl.SSLSocketInputRecord.read(Unknown Source)
    at java.base/sun.security.ssl.SSLSocketInputRecord.bytesInCompletePacket(Unknown Source)
    at java.base/sun.security.ssl.SSLSocketImpl.readApplicationRecord(Unknown Source)
    at java.base/sun.security.ssl.SSLSocketImpl$AppInputStream.read(Unknown Source)
    at org.apache.http.impl.io.SessionInputBufferImpl.streamRead(SessionInputBufferImpl.java:137)
    at org.apache.http.impl.io.SessionInputBufferImpl.fillBuffer(SessionInputBufferImpl.java:153)
    at org.apache.http.impl.io.SessionInputBufferImpl.readLine(SessionInputBufferImpl.java:280)
    at org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:138)
    at org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:56)
    at org.apache.http.impl.io.AbstractMessageParser.parse(AbstractMessageParser.java:259)
    at org.apache.http.impl.DefaultBHttpClientConnection.receiveResponseHeader(DefaultBHttpClientConnection.java:163)
    at org.apache.http.impl.conn.CPoolProxy.receiveResponseHeader(CPoolProxy.java:157)
    at org.apache.http.protocol.HttpRequestExecutor.doReceiveResponse(HttpRequestExecutor.java:273)
    at org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:125)
    at org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:272)
    at org.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:186)
    at org.apache.http.impl.execchain.RetryExec.execute(RetryExec.java:89)
    at org.apache.http.impl.execchain.RedirectExec.execute(RedirectExec.java:110)
    at org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:185)
    at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:83)
    at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:56)
    at org.springframework.http.client.HttpComponentsClientHttpRequest.executeInternal(HttpComponentsClientHttpRequest.java:87)
    at org.springframework.http.client.AbstractBufferingClientHttpRequest.executeInternal(AbstractBufferingClientHttpRequest.java:48)
    at org.springframework.http.client.AbstractClientHttpRequest.execute(AbstractClientHttpRequest.java:53)
    at org.springframework.web.client.RestTemplate.doExecute(RestTemplate.java:739)
    at org.springframework.web.client.RestTemplate.execute(RestTemplate.java:674)
    at org.springframework.web.client.RestTemplate.exchange(RestTemplate.java:583)
....

Configuration:

public static RestTemplate create(final int maxTotal, final int defaultMaxPerRoute,
                                  final int connectTimeout, final int readTimeout,
                                  final String userAgent) {
    final Registry<ConnectionSocketFactory> schemeRegistry = RegistryBuilder.<ConnectionSocketFactory>create()
            .register("http", PlainConnectionSocketFactory.getSocketFactory())
            .register("https", SSLConnectionSocketFactory.getSocketFactory())
            .build();

    final PoolingHttpClientConnectionManager connManager = new PoolingHttpClientConnectionManager(schemeRegistry);
    connManager.setMaxTotal(maxTotal);
    connManager.setDefaultMaxPerRoute(defaultMaxPerRoute);

    final CloseableHttpClient httpClient = HttpClients.custom()
            .setConnectionManager(connManager)
            .setUserAgent(userAgent)
            .setDefaultRequestConfig(RequestConfig.custom()
                                             .setConnectTimeout(connectTimeout)
                                             .setSocketTimeout(readTimeout)
                                             .setExpectContinueEnabled(false).build())
            .build();

    return new RestTemplateBuilder()
            .requestFactory(() -> new HttpComponentsClientHttpRequestFactory(httpClient))
            .build();
}

Has anyone experienced this issue? When I turn on debug logs on the http client, it is overflowing with noise and I am unable to discern anything useful...

like image 592
Urosh T. Avatar asked Nov 12 '20 19:11

Urosh T.


2 Answers

I will share my experience on this error probably it is the same problem you are facing. Comparing the stack trace which I had.

As this is happening randomly is the key phrase which I suspect that this is the same problem.

HTTP connections are made through an HTTP client library(Apache HTTP Client).

HTTP client usually manages, a re-usable pool of connections. This pool has a limit. In our case, the pool of connections is sometimes(Randomly) getting totally occupied. There are no more free connections which can be used anymore.

  1. You can either increase the pool size
  2. Implement a back-off retry mechanism which will try to grab a connection from the pool of HTTP connections when there is a failure on executing the HTTP request successfully.

If you wonder how to tune this underlying HTTP Client that is being used in sprint boot, check out this post.

like image 29
Jude Niroshan Avatar answered Sep 17 '22 13:09

Jude Niroshan


We had a similar problem when migrating to AWS/Kubernetes. I've found out why.

You're using a connection pool. The default behavior of the PoolingHttpClientConnectionManager is that it will reuse connections. So connections will not be closed immediately when your request is done. This will save resources by not having to reconnect all the time.

A Kubernetes cluster uses a NAT (Network Address Translation) for outgoing connections. When a connection is not used for a certain amount of time, the connection will be removed from the NAT-table, and the connection will be broken. This causes the seemingly random SSLExceptions.

On AWS, connections will be removed from the NAT-table when it is Idle for 350 seconds. Other Kubernetes instances might have other settings.

See https://docs.aws.amazon.com/vpc/latest/userguide/nat-gateway-troubleshooting.html

The solution:

Disable connection-reuse:

final CloseableHttpClient closeableHttpClient = HttpClients.custom()
    .setConnectionReuseStrategy(NoConnectionReuseStrategy.INSTANCE)
    .setConnectionManager(poolingHttpClientConnectionManager)
    .build();

Or, let the httpClient evict connections that are idle for too long:

return HttpClients.custom()
            .evictIdleConnections(300, TimeUnit.SECONDS)  //Read the javadocs, may not be used when the instance of HttpClient is created inside an EJB container.
            .setConnectionManager(poolingHttpClientConnectionManager)
            .build();
        

Or call setConnectionKeepAliveStrategy(....) with a custom KeepAliveStrategy that will never return -1 or a timeout with a value of more than 300 seconds .

like image 143
vancoeverden Avatar answered Sep 18 '22 13:09

vancoeverden