Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why is the SSL connection between a Spring Boot app and Consul failing after a few minutes?

I'm in the process of upgrading an environment with new versions of Ubuntu, Consul and Spring Boot. At first glance, everything seems to be working just fine. The app connects to Consul, requests its configuration and boots up. After a few minutes however, something breaks and this message is repeated approximately every 2 seconds:

com.ecwid.consul.transport.TransportException: javax.net.ssl.SSLHandshakeException: Remote host terminated the handshake
    at com.ecwid.consul.transport.AbstractHttpTransport.executeRequest(AbstractHttpTransport.java:77)
    at com.ecwid.consul.transport.AbstractHttpTransport.makeGetRequest(AbstractHttpTransport.java:34)
    at com.ecwid.consul.v1.ConsulRawClient.makeGetRequest(ConsulRawClient.java:128)
    at com.ecwid.consul.v1.catalog.CatalogConsulClient.getCatalogServices(CatalogConsulClient.java:120)
    at com.ecwid.consul.v1.ConsulClient.getCatalogServices(ConsulClient.java:372)
    at org.springframework.cloud.consul.discovery.ConsulCatalogWatch.catalogServicesWatch(ConsulCatalogWatch.java:129)
    at org.springframework.scheduling.support.DelegatingErrorHandlingRunnable.run(DelegatingErrorHandlingRunnable.java:54)
    at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
    at java.base/java.util.concurrent.FutureTask.runAndReset(FutureTask.java:305)
    at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:305)
    at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
    at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
    at java.base/java.lang.Thread.run(Thread.java:834)
Caused by: javax.net.ssl.SSLHandshakeException: Remote host terminated the handshake
    at java.base/sun.security.ssl.SSLSocketImpl.handleEOF(SSLSocketImpl.java:1313)
    at java.base/sun.security.ssl.SSLSocketImpl.decode(SSLSocketImpl.java:1152)
    at java.base/sun.security.ssl.SSLSocketImpl.readHandshakeRecord(SSLSocketImpl.java:1055)
    at java.base/sun.security.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:395)
    at org.apache.http.conn.ssl.SSLConnectionSocketFactory.createLayeredSocket(SSLConnectionSocketFactory.java:394)
    at org.apache.http.conn.ssl.SSLConnectionSocketFactory.connectSocket(SSLConnectionSocketFactory.java:353)
    at org.apache.http.impl.conn.DefaultHttpClientConnectionOperator.connect(DefaultHttpClientConnectionOperator.java:134)
    at org.apache.http.impl.conn.PoolingHttpClientConnectionManager.connect(PoolingHttpClientConnectionManager.java:353)
    at org.apache.http.impl.execchain.MainClientExec.establishRoute(MainClientExec.java:380)

I traced the fact that the message appears every 2 seconds to the apps health-check. Once the error occurs for the first time, it continues to occur on each subsequent health-check. This was confirmed by turning the health-check off and rebooting. This caused the error to occur exactly once, when the TTL on the Consul-data was reached.

This is where my understanding of the problem ends. I've tried to trace it to several things, but none of these led to a solution:

  • Checking the certificates - The certificates used are generated by Vault, signed by an intermediate certificate, signed by a self-signed root-certificate. These are then placed in a pkcs12 bundle, with the key and provided to the application. This works on all TLS-connections, also from the CLI tools and with curl. This seems like a dead end.
  • Network Connectivity - Since the connection is being reset, I tried to see if it was due to firewall or security-group issues. However, the relevant port (8501) is open for both TCP and UDP traffic and all manual tests with nc show the ports to be reachable.
  • IPv6 bug - Somewhere, I found a post saying this could be due to a bug with IPv6. I tried turning IPv6 off on the machine, reboot everything and try again. No luck, still the same error.
  • Versions of Consul - I tried running the application in our old environment, where Consul 1.2.3 is running and the error there does not show up. I'm still in the process of trying to find out if there is a specific Consul-version where this problem begins to occur, but haven't found it yet.
  • TLS bugs - Between Consul 1.2.3 and 1.7.2 there have been some changes to Consul's TLS-support as well as to the underlying Go TLS implementations. This came to light when testing with Consul 1.4.0, which provided a slightly different TLS error. Some suggestions on the internet were that there are conflicting implementations between Go and OpenJDK. I tried forcing the Java-application to use TLS 1.2, but again, no luck.
  • Handshake Debugging - Based on a tip from the comments, I used -Djavax.net.debug=ssl:handshake to find out what is happening during the handshake. During the first couple of minutes, the produced extra output shows what looks to me like normal handshakes. Once the problem occurs, the output of the handshakes stops right after "Produced Client Hello message" with a "Remote host terminated the handshake". I haven't been able to do the same with the other side of this connection. Consul is a Golang application. If anyone knows how to get the same debugging-information for a Golan app, please advise.

I hope someone has an idea about how to find the cause of this problem, or better yet, have a solution for it.

like image 409
jhkuperus Avatar asked May 15 '20 07:05

jhkuperus


1 Answers

After some more digging and trying other versions of things. I found that using GraalVM produces a different, but slightly more descriptive error. When trying to connect to the Consul-application, it immediately terminates with this message:

Caused by: javax.net.ssl.SSLHandshakeException: extension (5) should not be presented in certificate_request
    at java.base/sun.security.ssl.Alert.createSSLException(Alert.java:131)
    at java.base/sun.security.ssl.Alert.createSSLException(Alert.java:117)
    at java.base/sun.security.ssl.TransportContext.fatal(TransportContext.java:307)
    at java.base/sun.security.ssl.TransportContext.fatal(TransportContext.java:263)
    at java.base/sun.security.ssl.TransportContext.fatal(TransportContext.java:254)
    at java.base/sun.security.ssl.SSLExtensions.<init>(SSLExtensions.java:90)
    at java.base/sun.security.ssl.CertificateRequest$T13CertificateRequestMessage.<init>(CertificateRequest.java:818)
    at java.base/sun.security.ssl.CertificateRequest$T13CertificateRequestConsumer.consume(CertificateRequest.java:922)
    at java.base/sun.security.ssl.SSLHandshake.consume(SSLHandshake.java:392)
    at java.base/sun.security.ssl.HandshakeContext.dispatch(HandshakeContext.java:443)
    at java.base/sun.security.ssl.HandshakeContext.dispatch(HandshakeContext.java:421)
    at java.base/sun.security.ssl.TransportContext.dispatch(TransportContext.java:177)
    at java.base/sun.security.ssl.SSLTransport.decode(SSLTransport.java:164)
    at java.base/sun.security.ssl.SSLSocketImpl.decode(SSLSocketImpl.java:1151)
    at java.base/sun.security.ssl.SSLSocketImpl.readHandshakeRecord(SSLSocketImpl.java:1062)
    at java.base/sun.security.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:402)
    at org.apache.http.conn.ssl.SSLConnectionSocketFactory.createLayeredSocket(SSLConnectionSocketFactory.java:394)
    at org.apache.http.conn.ssl.SSLConnectionSocketFactory.connectSocket(SSLConnectionSocketFactory.java:353)
    at org.apache.http.impl.conn.DefaultHttpClientConnectionOperator.connect(DefaultHttpClientConnectionOperator.java:134)
    at org.apache.http.impl.conn.PoolingHttpClientConnectionManager.connect(PoolingHttpClientConnectionManager.java:353)
    at org.apache.http.impl.execchain.MainClientExec.establishRoute(MainClientExec.java:380)
    at org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:236)
    at org.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:184)
    at org.apache.http.impl.execchain.RetryExec.execute(RetryExec.java:88)
    at org.apache.http.impl.execchain.RedirectExec.execute(RedirectExec.java:110)
    at org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:184)
    at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:71)
    at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:220)
    at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:164)
    at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:139)
    at com.ecwid.consul.transport.AbstractHttpTransport.executeRequest(AbstractHttpTransport.java:61)

This then led me to an issue on the Golang GitHub-page: https://github.com/golang/go/issues/35722. It details similar issues from various people, but constantly with slightly different details. In that thread, there is mention of a discrepancy between TLS 1.3 implementations between Go and Java. The OpenJDK-maintainers also chip in and refer to this issue: https://bugs.openjdk.java.net/browse/JDK-8236039.

It has been fixed and closed, but is not yet available in any of my regular binary distributions. I will try to check whether that version actually fixes the problem. However, there is a workaround by forcing Java to use only TLS1.2. You can do this by adding -Djdk.tls.client.protocols=TLSv1.2 to the startup arguments.

like image 140
jhkuperus Avatar answered Oct 21 '22 07:10

jhkuperus