We have a Webstart client that communicates to the server by sending serialized objects over HTTPS using java.net.HttpsURLConnection
.
Everything works perfectly fine on my local machine and on test servers located in our office, but I'm experiencing a very, very strange issue which is only occurring on our production and staging servers (and sporadically at that). The main difference I know of between those servers and the ones in our office is that they are located elsewhere and client-server communication with them is considerably slower, but it worked fine for a long time in production prior to this as well.
Anyway, here's what's happening:
Content-Type
on the HttpURLConnection
, calls getOutputStream()
on it to get the stream to write to.java.net.ConnectException: Connection timed out: connect at java.net.PlainSocketImpl.socketConnect(Native Method) at java.net.PlainSocketImpl.doConnect(Unknown Source) at java.net.PlainSocketImpl.connectToAddress(Unknown Source) at java.net.PlainSocketImpl.connect(Unknown Source) at java.net.SocksSocketImpl.connect(Unknown Source) at java.net.Socket.connect(Unknown Source) at com.sun.net.ssl.internal.ssl.SSLSocketImpl.connect(Unknown Source) at com.sun.net.ssl.internal.ssl.BaseSSLSocketImpl.connect(Unknown Source) at sun.net.NetworkClient.doConnect(Unknown Source) at sun.net.www.http.HttpClient.openServer(Unknown Source) at sun.net.www.http.HttpClient.openServer(Unknown Source) at sun.net.www.protocol.https.HttpsClient.(Unknown Source) at sun.net.www.protocol.https.HttpsClient.New(Unknown Source) at sun.net.www.protocol.https.AbstractDelegateHttpsURLConnection.getNewHttpClient(Unknown Source) at sun.net.www.protocol.http.HttpURLConnection.plainConnect(Unknown Source) at sun.net.www.protocol.https.AbstractDelegateHttpsURLConnection.connect(Unknown Source) at sun.net.www.protocol.http.HttpURLConnection.getOutputStream(Unknown Source) at sun.net.www.protocol.https.HttpsURLConnectionImpl.getOutputStream(Unknown Source)
Note that this is not a SocketTimeoutException
, which the connect()
method on HttpURLConnection
says it throws if the timeout expires before a connection can be established. Also, when this happens I am able to call conn.getResponseCode()
and I get a response code of 200.
EOFException
is thrown in ObjectInputStream
's constructor, which tries to read the serialization header but fails because the client never gets the OutputStream
to write to.In case it helps, here are the calls being made on the HttpsURLConnection
prior to the call to getOutputStream()
(edited to show only the calls being made rather than the whole structure of the code doing this):
HttpsURLConnection conn = (HttpsURLConnection) url.openConnection();
conn.setUseCaches(false);
conn.setReadTimeout(30000);
conn.setRequestProperty("Cookie", cookie);
conn.setDoOutput(true);
conn.setRequestProperty("Content-Type", "application/x-java-serialized-object");
conn.getOutputStream();
The thing is, I have no idea how any of this could be happening, especially given that it only happens occasionally (no clear pattern of activity that I can tell) and even then only when there's (relatively) high latency between the client and the server.
Given what I've been able to find so far about java.net.ConnectException: Connect timed out
, I wondered if it weren't some network or firewall issue on the network our servers are running on... but that doesn't make much sense to me given that the request is clearly getting through to the servlet. Also, other apps running on the same network have not reported similar issues.
Does anyone have any idea what the cause of this could be, or even what I should investigate?
TCP Socket Timeouts are caused when a TCP socket times out talking to the far end. Socket timeouts can occur when attempting to connect to a remote server, or during communication, especially long-lived ones.
How do you handle Java net ConnectException connection refused? Ans: To resolve the Java net error, first try to ping the destination host, if you are able to ping the host means the client and server machine are in the proper network. Your next step should be connecting to the server host and port using telnet.
connection timeout — a time period in which a client should establish a connection with a server. socket timeout — a maximum time of inactivity between two data packets when exchanging data with a server.
We have come across these in a similar case to yours. Usually at high load and not easy to reproduce on test. Have not fixed it yet but this is the steps we went through.
If it's a firewall issue, we would get a Connection Refused or the SocketTimeout exception.
1) Are you able to track these requests in the access log on the server - do they show an HTTP status 200 or 404 or something else? In our case, the server (IIS in this case) logs showed the client closed the connection and not the server. So that was a mystery.
Update: If the client always gets a 200, then the server has actually sent back some response but I suspect the response byte-size (if this is recorded in the access logs) will show a different value from that of the normal response size for that request.
If it shows the same size of response, then you have a (may be not plausible) condition that the server actually responded correctly but the client did not get the response back because the connection terminated somewhere in between.
2) The network admin teams looked at the TCP/IP traffic to determine which end (or intermediate router) is terminating the HTTP / TCP-IP conversation. And once we understand which end is terminating the connection is to look at why. Someone knowledgable enough could run snoop
3) Is there a max number of requests configured/restricted on the server - and is that throttling your connections?
4) Are there any intermediate load balancers at which requests could be dropped?
Update: One more thing we wanted to, but did not complete is to create a static route between client and server to reduce the number of hops in between and ensure no network related connection drops. See http://en.wikipedia.org/wiki/Static_routing
5) Another suggestion is setting the ConnectTimeout too to see if these work with a higher value. Update: You might want to try conn.getErrorStream()
Returns the error stream if the connection failed but the server sent useful data nonetheless. If the connection was not connected, or if the server did not have an error while connecting or if the server had an error but no error data was sent, this method will return null.
6) Could also try taking a set of thread dumps on the server 5 seconds apart, to see if any thread shows these incoming requests on the server.
Update: As of today we learnt to live with this problem, because we totalled the failure rate to be 200-300 out of 400,000 requests per day which is 0.00075 %
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With