When conducting a stress test on some server code I wrote, I noticed that even though I am calling close() on the descriptor handle (and verifying the result for errors) that the descriptor is not released which eventually causes accept() to return an error "Too many open files".
Now I understand that this is because of the ulimit, what I don't understand is why I am hitting it if I call close() after each synchronous accept/read/send cycle?
I am validating that the descriptors are in fact there by running a watch with lsof:
ctsvr 9733 mike 1017u sock 0,7 0t0 3323579 can't identify protocol ctsvr 9733 mike 1018u sock 0,7 0t0 3323581 can't identify protocol ...
And sure enough there are about 1000 or so of them. Further more, checking with netstat I can see that there are no hanging TCP states (no WAIT or STOPPED or anything).
If I simply do a single connect/send/recv from the client, I do notice that the socket does stay listed in lsof; so this is not even a load issue.
The server is running on an Ubuntu Linux 64-bit machine.
Any thoughts?
close() closes a file descriptor, so that it no longer refers to any file and may be reused. Any record locks (see fcntl(2)) held on the file it was associated with, and owned by the process, are removed (regardless of the file descriptor that was used to obtain the lock).
A file descriptor is eventually closed by the close(2) system call or by the process' exit. By default, file descriptors 0, 1, and 2 are opened automatically by the C runtime library and represent the standard input, standard output, and standard error streams for a process.
A socket is just a special form of a file. For example, you can use the syscalls used on file descriptors, read() and write(), on socket descriptors.
in the server from gdb, and the client will then disconnect. This happens maybe once in 50,000 requests, but might not happen for extended periods. Then the thread picks up the socket and builds the response.
So using strace (thanks Gearoid), which I have no idea how I ever lived without, I noted I was in fact closing the descriptors.
However. And for the sake of posterity I lay bare my foolish mistake:
Socket::Socket() : impl(new Impl) {
impl->fd = socket(AF_INET, SOCK_STREAM, IPPROTO_TCP);
....
}
Socket::ptr_t Socket::accept() {
auto r = ::accept(impl->fd, NULL, NULL);
...
ptr_t s(new Socket);
s->impl->fd = r;
return s;
}
As you can see, my constructor allocated a socket immediately, and then I replaced the descriptor with the one returned by accept - creating a leak. I had refactored the accept code from a standalone Acceptor class into the Socket class without changing this.
Using strace I could easily see socket() being run each time which lead to my light bulb moment.
Thanks all for the help!
Have you ever called perror() after close()? I think the returned string will give you some help;
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With