I have an application that consists of two processes (let's call them A and B), connected to each other through Unix domain sockets. Most of the time it works fine, but some users report the following behavior:
Users have also reported variations of this behavior, e.g.:
The problem is I cannot reproduce this behavior locally at all. I've tried OS X and Linux. The users are on a variety of systems, mostly OS X and Linux.
Things that I've already tried and considered:
What else can possibly cause behavior like this? I know for certain that neither A nor B close() the socket prematurely, and I know for certain that neither of them have crashed because both A and B were able to report the error. It is as if the kernel suddenly decided to pull the plug from the socket for some reason.
Perhaps you could try strace as described in: http://modperlbook.org/html/6-9-1-Detecting-Aborted-Connections.html
I assume that your problem is related to the one described here: http://blog.netherlabs.nl/articles/2009/01/18/the-ultimate-so_linger-page-or-why-is-my-tcp-not-reliable
Unfortunately I'm having a similar problem myself but couldn't manage to get it fixed with the given advices. However, perhaps that SO_LINGER thing works for you.
shutdown()
may have been called on one of the
socket endpoints.
If either side may fork and execute a
child process, ensure that the
FD_CLOEXEC
(close-on-exec) flag is set on the
socket file descriptor if you did not
intend for it to be inherited by the
child. Otherwise the child process
could (accidentally or otherwise) be
manipulating your socket connection.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With