Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Variety of HTTPs errors while communicating to server from Android App

UPDATE: 04 Jan 2015

I still have these issues. Users of our app have increased and I see all kind of network errors. Our app sends out emails everytime there is a network related error on app.

Our app does a financial transactions - so re-submits are not really idempotent - so very scared of enabling HttpClient's retry feature. we have done some kind of response caching on server to handle re-submits done explicitly by user. However, still no solution that works without bad user experience.

Original Question

I have an android app which posts data as part of user operation. The data includes few images & I package them as Protobuf message (byte array, in effect) and post it to server over HTTPS connection.

Though the app works fine for most part, but we are seeing connection errors occasionally. The issue has become more pronounced now that we have some users in relatively slow network areas (2G connections). However, the issue is not limited to slow connections areas, issue is seen with customers using WiFi and 3G connections.

Here are few exceptions we notice in our App logs

Below one happens after 5 minutes, as I had set Socket timeout to 5 minutes. The app was trying to post 145kb of data in this case

Stack trace java.net.SocketTimeoutException: Read timed out at org.apache.harmony.xnet.provider.jsse.NativeCrypto.SSL_read(Native Method) at org.apache.harmony.xnet.provider.jsse.OpenSSLSocketImpl$SSLInputStream.read(OpenSSLSocketImpl.java:662) at org.apache.http.impl.io.AbstractSessionInputBuffer.fillBuffer(AbstractSessionInputBuffer.java:103) at org.apache.http.impl.io.AbstractSessionInputBuffer.readLine(AbstractSessionInputBuffer.java:191)

Below one happened 2.5 minutes ( socket timeout was set to 5 minutes), client was sending 144kb of data

javax.net.ssl.SSLException: Write error: ssl=0x5e4f4640: I/O error during system call, Broken pipe at org.apache.harmony.xnet.provider.jsse.NativeCrypto.SSL_write(Native Method) at org.apache.harmony.xnet.provider.jsse.OpenSSLSocketImpl$SSLOutputStream.write(OpenSSLSocketImpl.java:704) at org.apache.http.impl.io.AbstractSessionOutputBuffer.write(AbstractSessionOutputBuffer.java:109) at org.apache.http.impl.io.ContentLengthOutputStream.write(ContentLengthOutputStream.java:113)

Below one happened after 1 minute.

Stack trace javax.net.ssl.SSLException: Connection closed by peer at org.apache.harmony.xnet.provider.jsse.NativeCrypto.SSL_do_handshake(Native Method) at org.apache.harmony.xnet.provider.jsse.OpenSSLSocketImpl.startHandshake(OpenSSLSocketImpl.java:378) at org.apache.harmony.xnet.provider.jsse.OpenSSLSocketImpl$SSLInputStream.(OpenSSLSocketImpl.java:634) at org.apache.harmony.xnet.provider.jsse.OpenSSLSocketImpl.getInputStream(OpenSSLSocketImpl.java:605)

Below one happened after 77 seconds

Stack trace javax.net.ssl.SSLException: SSL handshake aborted: ssl=0x5e2baf00: I/O error during system call, Connection reset by peer at org.apache.harmony.xnet.provider.jsse.NativeCrypto.SSL_do_handshake(Native Method) at org.apache.harmony.xnet.provider.jsse.OpenSSLSocketImpl.startHandshake(OpenSSLSocketImpl.java:378) at org.apache.harmony.xnet.provider.jsse.OpenSSLSocketImpl$SSLInputStream.(OpenSSLSocketImpl.java:634) at org.apache.harmony.xnet.provider.jsse.OpenSSLSocketImpl.getInputStream(OpenSSLSocketImpl.java:605) at org.apache.http.impl.io.SocketInputBuffer.(SocketInputBuffer.java:70)

Below one happened after 15 seconds (Connect timeout is set to 15 seconds)

Time Taken : 15081 Stack trace org.apache.http.conn.ConnectTimeoutException: Connect to /103.xx.xx.xx:443 timed out at org.apache.http.conn.scheme.PlainSocketFactory.connectSocket(PlainSocketFactory.java:121) at org.apache.http.impl.conn.DefaultClientConnectionOperator.openConnection(DefaultClientConnectionOperator.java:144) at org.apache.http.impl.conn.AbstractPoolEntry.open(AbstractPoolEntry.java:164) at org.apache.http.impl.conn.AbstractPooledConnAdapter.open(AbstractPooledConnAdapter.java:119) at org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:365)

Here is the source code snippets that I use for posting the reqeust

HttpParams params = new BasicHttpParams();
HttpConnectionParams.setConnectionTimeout(params, 15000); //15 seconds
HttpConnectionParams.setSoTimeout(params, 300000); // 5 minutes

HttpClient client = getHttpClient(params);
HttpPost post = new HttpPost(uri);
post.setEntity(new ByteArrayEntity(requestByteArray));
HttpResponse httpResponse = client.execute(post);

    ....

public static HttpClient getHttpClient(HttpParams params) {
    try {
        KeyStore trustStore = KeyStore.getInstance(KeyStore.getDefaultType());
        trustStore.load(null, null);

        SSLSocketFactory sf = new TrustAllCertsSSLSocketFactory(trustStore);
        sf.setHostnameVerifier(SSLSocketFactory.STRICT_HOSTNAME_VERIFIER);


        HttpProtocolParams.setVersion(params, HttpVersion.HTTP_1_1);
        HttpProtocolParams.setContentCharset(params, HTTP.UTF_8);

        SchemeRegistry registry = new SchemeRegistry();
        registry.register(new Scheme("http", PlainSocketFactory.getSocketFactory(), 80));
        registry.register(new Scheme("https", sf, 443));

        ClientConnectionManager ccm = new ThreadSafeClientConnManager(params, registry);
        DefaultHttpClient client = new DefaultHttpClient(ccm, params);
        // below line of code will disable the retrying of HTTP request when connection is timed
        // out.

        client.setHttpRequestRetryHandler(new DefaultHttpRequestRetryHandler(0, false));
        return client;
    } catch (Exception e) {
        return new DefaultHttpClient();
    }
}

I have read some forums indicating that we should use HttpUrlConnection class. I did make code changes to use https://code.google.com/p/basic-http-client/ as a hot fix. Though it worked on my Samsung phone, it seemed to have some issue in phone customer was using, it was not even able to connect to our site. I had to roll it back, though I can relook at it if the root cause can be pinned to DefaultHttpClient.

OUr web server is nginx, and our web service runs on Apache Tomcat. Customers are mostly using Android 4.1+ phones. The customer from whose phone I have retrieved above stack traces is using Micromax A110Q phone with Android 4.2.1

Any inputs on this will be highly appreciated. Thanks a lot!

Update:

  1. I had noticed that we were not shutting down the Connection Manager. So added below code in finally block of the code where I use the http client.
  if (client != null) {           client.getConnectionManager().shutdown();
  }
  1. Updated nginx configuration to accept data upto size of 5M as its default is 1Mb and some clients were submitting more than 1MB and server was severing connection with 413 error.
client_max_body_size 5M;
  1. Also increased the nginx proxy read timeout so that it waits longer for getting data from client.
proxy_read_timeout 300;

With the above changes, the errors have reduced a bit. In last one week, I see following two types of erros:

  1. org.apache.http.conn.ConnectTimeoutException: Connect to /103.xx.xx.xxx:443 timed out - This happens in 15 seconds which is my connect timeout. I am assuming that this happens as client is unable to reach to server due to network slowness or as @JaySoyer pointed out, may be due to network switching.

  2. java.net.SocketTimeoutException: SSL handshake timed out at org.apache.harmony.xnet.provider.jsse.NativeCrypto.SSL_do_handshake(Native Method). This is happening at the expiry of socket timeout. I am now using 1 minute as socket timeout for small requests, and 3 and 6 minutes for packets upto 75 KB and higher respectively.

However, these errors have reduced considerably, and I am seeing 1 failure in 100 requests, compared with earlier version of my code where it was 1 in 10 requests.

like image 435
Wand Maker Avatar asked Jul 31 '14 12:07

Wand Maker


2 Answers

I recently had to do an exhaustive analysis of my company's app as we were seeing a bunch of similar errors and didn't know why. We ended up handing out custom apps that literally logged their connection times, errors, signal quality, etc to a file. Did that for weeks. Collect thousands of data points. Keep in mind, we maintain a persistent connection while the app is open.

Turns out most of our errors were from switching networks. This is actually really common for an average user. So lets say a user is using an EDGE cell network, then walks within WIFI range or vice versa. When this occurs, Android literally severs the cell connection and makes an entirely new connection to the WIFI. From the apps perspective, it's similar to turning on airplane mode then flicking it back off again. This even occurs when switching within a cell networks. Eg, LTE to HSPA+. Each time this happens, Android will fire off the network connective changed broadcast.

Of those you listed, this behavior was causing the following similar errors:

  • javax.net.ssl.SSLException: Write error: ssl=0x5e4f4640
  • javax.net.ssl.SSLException: SSL handshake aborted:

Sometimes the network switch was fast, sometimes slow. Turns out, we were not cleaning up our resources in time with the fast switches. As a result we were attempting to re-connect to our servers with stale/old TCP connections that threw even more odd errors.

So I guess the take away is, if you are maintaining a connection for a long period of time, expect to see the phone constantly switch between networks, especially when the signal is weak. When that network switch occurs, you'll see SSLExeptions and it's completely normal. Just gotta make sure you clean up your resources and reconnect properly.

like image 160
Jay Soyer Avatar answered Nov 13 '22 19:11

Jay Soyer


Since you are dealing with what looks like poor network connectivity, consider a more fault-tolerant HTTP client. The one I like is OkHTTP. From their description:

OkHttp perseveres when the network is troublesome: it will silently recover from common connection problems. If your service has multiple IP addresses OkHttp will attempt alternate addresses if the first connect fails. This is necessary for IPv4+IPv6 and for services hosted in redundant data centers. OkHttp initiates new connections with modern TLS features (SNI, ALPN), and falls back to SSLv3 if the handshake fails.

The implementation would be mostly a drop-in replacement.

like image 2
David S. Avatar answered Nov 13 '22 19:11

David S.