Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Troubleshooting connections stuck in CLOSE_WAIT status

I have a Java application running in WebLogic 11g on Windows, which after several days, becomes unresponsive. One suspicious symptom I've noticed is that a large number of connections (about 3000) show up in netstat with a CLOSE_WAIT status even when the server is idle. Since the application server is managing the client connections, I'm not sure what's causing this. We also make a number of web service calls that loopback to the same server, but I believe those connections get closed properly. What else could cause this and how does one troubleshoot a problem like this?

like image 830
Rob H Avatar asked Apr 12 '11 13:04

Rob H


4 Answers

CLOSE_WAIT is the state the local TCP state machine is in when the remote host sends a FIN (closes it's connection) but the local application has not done the same and sent a reply FIN. It's still possible for the local machine to send data at this point though the client cannot receive it (unless it did only a half-close on the connection).

When the remote host closes (sends a FIN), your local application will get an event of some sort (it's a "read" event on the socket in the base C library) but reading from that connection will return an error to indicate that the connection has closed. At this point the local application should close the connection.

I know little about Java and nothing about WebLogic but I suppose it's possible that the application is not handling the read error properly and thus never closing the connection.

like image 101
Brian White Avatar answered Nov 12 '22 09:11

Brian White


I have been having the same issue and I have been studying sockets to get rid of this issue.

Let me say a few words, but before i must say I am not a Java programmer.

I will not explain what close_wait is, as Brian White already said everything that should be said.

To avoid close_wait, you need to make sure your server does not close the connection after it sends back the response because whomever disconnects first get stuck in close_wait and time_wait. So, if your server is getting stuck in close_wait it tells me that it is disconnecting after it sends the response.

You should avoid that by doing a few things.

1 - If your client application is not using the http 1.1 protocol you must set it to use that because of the 'keep-alive http header option.

2 - If you client is running http 1.1 and that does not work, or, if you must use http 1.0, you should set the connection request header property:

connection: keep-alive

This tells the server that neither the client nor the server should disconnect after completing a request. By doing that your server will not disconnect after every request it receives.

3 - In your client, reuse your socket. If you are creating a lot of sockets clients in a loop for example, you should create a socket once and them use it every time you need to send a request. The approach I used in my app is to have a socket pool and get one socket available (which is already connected to the server and it has the keep-alive property). Then I use it and when i am done I put it back in the pool to be reusable.

4 - If you really need to disconnect after sending a request, make sure your client does that and keep the connection: keep-alive.

And yes, you may have problems when you have a lot of close_waits or time_waits on the server side.

Check out this [link][1] which explain what keep-alive is.

I hope this was helpful. With those things I managed to solve my problem.

[1]: http://www.w3.org/Protocols/HTTP/1.1/draft-ietf-http-v11-spec-01.html#Persistent Connections

like image 23
Rafael Colucci Avatar answered Nov 12 '22 09:11

Rafael Colucci


The CLOSE_WAIT status means that the other side has initiated a connection close, but the application on the local side has not yet closed the socket.

It sounds like you have a bug in your local application.

like image 4
caf Avatar answered Nov 12 '22 09:11

caf


I found this quote about CLOSE_WAIT pileups: "Something is either preventing progress to occur in the HTTP session (we are stuck so never end up calling close), or some bug has been introduced that prevents the socket from being closed. There are a number of ways this can happen."

Think: Is there any way your application might be getting stuck while processing a request? Or WebLogic itself?

Examine: Can you do Java thread dumps (kill -SIGQUIT can be used for that on the Oracle JVM for Linux) to try to see if in fact any of your threads ARE getting stuck?

Examine the client side: First, find out the IP address or hostname of the clients that are connected to the CLOSE_WAIT sockets. Then, see if anything suspicious is happening on those clients.

like image 2
Robin Green Avatar answered Nov 12 '22 10:11

Robin Green