Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

javax.net.ssl.SSLException: Read error: ssl=0x9524b800: I/O error during system call, Connection reset by peer

Our clients are starting to see 100s of these "SSLException error - Connection reset by peer" over the last couple of weeks and I can't figure out why

  1. We're using Retrofit with okhttp, no special configuration

    public class OkHttpClientProvider implements IOkHttpClientProvider {      OkHttpClient okHttpClient;      public OkHttpClientProvider() {         this.okHttpClient = createClient();     }      public OkHttpClient getOkHttpClient() {         return this.okHttpClient;     }      private OkHttpClient createClient() {         return new OkHttpClient();     } } 

The above client provider is a singleton. The RestAdapter is built using this injected client (we use dagger) -

RestAdapter.Builder restAdapterBuilder = new RestAdapter.Builder()                                         .setConverter(converter)                                         .setEndpoint(networkRequestDetails.getServerUrl())                                         .setClient(new OkClient(okHttpClientProvider.getOkHttpClient()))                                         .setErrorHandler(new NetworkSynchronousErrorHandler(eventBus))                                         ); 

Based on stack overflow solutions what I've found out -

  1. The keep alive duration on the server is 180 seconds, OkHttp has a default of 300 seconds

  2. The server returns "Connection: close" in its header but the client request sends "Connection: keepAlive"

  3. The server supports TLS 1.0 / 1.1 / 1.2 and uses Open SSL

  4. Our servers have moved to another hosting provider recently in another geography so I don't know if these are DNS failures or not

  5. We've tried tweaking things like keepAlive, reconfigured OpenSSL on the server but for some reason the Android client keeps getting this error

  6. It happens immediately without any delay when you try to use the app to post something or pull to refresh (it doesn't even go to network or have a delay before this exception happens which would imply the connection is already broken). But trying it multiple times somehow "fixes it" and we get a success. It happens again later

  7. We've invalidated our DNS entries on the server to see if this what caused it but that hasn't helped

  8. It mostly happens on LTE but I've seen it on Wifi as well

I don't want to disable keep alive because most modern clients don't do that. Also we're using OkHttp 2.4 and this is a problem on post Ice cream sandwich devices so I'm hoping it should take care of these underlying networking issues. The iOS client also gets these exceptions but close to a 100 times less (iOS client uses AFNetworking 2.0). I'm struggling to find new things to try at this point, any help / ideas?

Update - Adding full stack trace through okhttp

      retrofit.RetrofitError: Read error: ssl=0x9dd07200: I/O error during system call, Connection reset by peer               at retrofit.RestAdapter$RestHandler.invokeRequest(RestAdapter.java:390)               at retrofit.RestAdapter$RestHandler.invoke(RestAdapter.java:240)               at java.lang.reflect.Proxy.invoke(Proxy.java:397)               at $Proxy15.getAccessTokenUsingResourceOwnerPasswordCredentials(Unknown Source)               at com.company.droid.repository.network.NetworkRepository.getAccessTokenUsingResourceOwnerPasswordCredentials(NetworkRepository.java:76)               at com.company.droid.ui.login.LoginTask.doInBackground(LoginTask.java:88)               at com.company.droid.ui.login.LoginTask.doInBackground(LoginTask.java:23)               at android.os.AsyncTask$2.call(AsyncTask.java:292)               at java.util.concurrent.FutureTask.run(FutureTask.java:237)               at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1112)               at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:587)               at java.lang.Thread.run(Thread.java:818)        Caused by: javax.net.ssl.SSLException: Read error: ssl=0x9dd07200: I/O error during system call, Connection reset by peer               at com.android.org.conscrypt.NativeCrypto.SSL_read(Native Method)               at com.android.org.conscrypt.OpenSSLSocketImpl$SSLInputStream.read(OpenSSLSocketImpl.java:699)               at okio.Okio$2.read(Okio.java:137)               at okio.AsyncTimeout$2.read(AsyncTimeout.java:211)               at okio.RealBufferedSource.indexOf(RealBufferedSource.java:306)               at okio.RealBufferedSource.indexOf(RealBufferedSource.java:300)               at okio.RealBufferedSource.readUtf8LineStrict(RealBufferedSource.java:196)               at com.squareup.okhttp.internal.http.HttpConnection.readResponse(HttpConnection.java:191)               at com.squareup.okhttp.internal.http.HttpTransport.readResponseHeaders(HttpTransport.java:80)               at com.squareup.okhttp.internal.http.HttpEngine.readNetworkResponse(HttpEngine.java:917)               at com.squareup.okhttp.internal.http.HttpEngine.readResponse(HttpEngine.java:793)               at com.squareup.okhttp.internal.huc.HttpURLConnectionImpl.execute(HttpURLConnectionImpl.java:439)               at com.squareup.okhttp.internal.huc.HttpURLConnectionImpl.getResponse(HttpURLConnectionImpl.java:384)               at com.squareup.okhttp.internal.huc.HttpURLConnectionImpl.getResponseCode(HttpURLConnectionImpl.java:497)               at com.squareup.okhttp.internal.huc.DelegatingHttpsURLConnection.getResponseCode(DelegatingHttpsURLConnection.java:105)               at com.squareup.okhttp.internal.huc.HttpsURLConnectionImpl.getResponseCode(HttpsURLConnectionImpl.java:25)               at retrofit.client.UrlConnectionClient.readResponse(UrlConnectionClient.java:73)               at retrofit.client.UrlConnectionClient.execute(UrlConnectionClient.java:38)               at retrofit.RestAdapter$RestHandler.invokeRequest(RestAdapter.java:321)               at retrofit.RestAdapter$RestHandler.invoke(RestAdapter.java:240)               at java.lang.reflect.Proxy.invoke(Proxy.java:397)               at $Proxy15.getAccessTokenUsingResourceOwnerPasswordCredentials(Unknown Source)               at com.company.droid.repository.network.NetworkRepository.getAccessTokenUsingResourceOwnerPasswordCredentials(NetworkRepository.java:76)               at com.company.droid.ui.login.LoginTask.doInBackground(LoginTask.java:88)               at com.company.droid.ui.login.LoginTask.doInBackground(LoginTask.java:23)               at android.os.AsyncTask$2.call(AsyncTask.java:292)               at java.util.concurrent.FutureTask.run(FutureTask.java:237)               at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1112)               at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:587)               at java.lang.Thread.run(Thread.java:818)       ]} 
like image 423
Rickster Avatar asked May 29 '15 20:05

Rickster


2 Answers

Recently I faced the issue while working on some legacy code. After googling I found that the issue is everywhere but without any concrete resolution. I worked on various parts of the exception message and analyzed below.

Analysis:

  1. SSLException: exception happened with the SSL (Secure Socket Layer), which is implemented in javax.net.ssl package of the JDK (openJDK/oracleJDK/AndroidSDK)
  2. Read error ssl=# I/O error during system call: Error occured while reading from the Secure socket. It happened while using the native system libraries/driver. Please note that all the platforms solaris, Windows etc. have their own socket libraries which is used by the SSL. Windows uses WINSOCK library.
  3. Connection reset by peer: This message is reported by the system library (Solaris reports ECONNRESET, Windows reports WSAECONNRESET), that the socket used in the data transfer is no longer usable because an existing connection was forcibly closed by the remote host. One needs to create a new secure path between the host and client

Reason:

Understanding the issue, I try finding the reason behind the connection reset and I came up with below reasons:

  • The peer application on the remote host is suddenly stopped, the host is rebooted, the host or remote network interface is disabled, or the remote host uses a hard close.
  • This error may also result if a connection was broken due to keep-alive activity detecting a failure while one or more operations are in progress. Operations that were in progress fail with Network dropped connection on reset(On Windows(WSAENETRESET)) and Subsequent operations fail withConnection reset by peer(On Windows(WSAECONNRESET)).
  • If the target server is protected by Firewall, which is true in most of the cases, the Time to live (TTL) or timeout associated with the port forcibly closes the idle connection at given timeout. this is something of our interest

Resolution:

  1. Events on the server side such as sudden service stop, rebooted, network interface disabled can not be handled by any means.
  2. On the server side, Configure firewall for the given port with the higher Time to Live (TTL) or timeout values such as 3600 secs.
  3. Clients can "try" keeping the network active to avoid or reduce the Connection reset by peer.
  4. Normally on going network traffic keeps the connection alive and problem/exception is not seen frequently. Strong Wifi has least chances of Connection reset by peer.
  5. With the mobile networks 2G, 3G and 4G where the packet data delivery is intermittent and dependent on the mobile network availability, it may not reset the TTL timer on the server side and results into the Connection reset by peer.

Here are the terms suggested to set on various forums to resolve the issue

  • ConnectionTimeout: Used only at the time out making the connection. If host takes time to connection higher value of this makes the client wait for the connection.
  • SoTimeout: Socket timeout-It says the maximum time within which the a data packet is received to consider the connection as active.If no data received within the given time, the connection is assumed as stalled/broken.
  • Linger: Upto what time the socket should not be closed when data is queued to be sent and the close socket function is called on the socket.
  • TcpNoDelay: Do you want to disable the buffer that holds and accumulates the TCP packets and send them once a threshold is reached? Setting this to true will skip the TCP buffering so that every request is sent immediately. Slowdowns in the network may be caused by an increase in network traffic due to smaller and more frequent packet transmission.

So none of the above parameter helps keeping the network alive and thus ineffective.

I found one setting that may help resolving the issue which is this functions

setKeepAlive(true) setSoKeepalive(HttpParams params, enableKeepalive="true")  

How did I resolve my issue?

  • Set the HttpConnectionParams.setSoKeepAlive(params, true)
  • Catch the SSLException and check for the exception message for Connection reset by peer
  • If exception is found, store the download/read progress and create a new connection.
  • If possible resume the download/read else restart the download

I hope the details help. Happy Coding...

like image 109
Devendra Vaja Avatar answered Oct 08 '22 09:10

Devendra Vaja


If using Nginx and getting a similar problem, then this might help:

Scan your domain on this sslTesturl, and see if the connection is allowed for your device version.

If lower version devices(like < Android 4.4.2 etc) are not able to connect due to TLS support, then try adding this to your Nginx config file,

ssl_protocols TLSv1 TLSv1.1 TLSv1.2; 
like image 43
jayiitb Avatar answered Oct 08 '22 10:10

jayiitb