What can cause a spontaneous EPIPE error without either end calling close() or crashing?

Question

I have an application that consists of two processes (let's call them A and B), connected to each other through Unix domain sockets. Most of the time it works fine, but some users report the following behavior:

A sends a request to B. This works. A now starts reading the reply from B.
B sends a reply to A. The corresponding write() call returns an EPIPE error, and as a result B close() the socket. However, A did not close() the socket, nor did it crash.
A's read() call returns 0, indicating end-of-file. A thinks that B prematurely closed the connection.

Users have also reported variations of this behavior, e.g.:

A sends a request to B. This works partially, but before the entire request is sent A's write() call returns EPIPE, and as a result A close() the socket. However B did not close() the socket, nor did it crash.
B reads a partial request and then suddenly gets an EOF.

The problem is I cannot reproduce this behavior locally at all. I've tried OS X and Linux. The users are on a variety of systems, mostly OS X and Linux.

Things that I've already tried and considered:

Double close() bugs (close() is called twice on the same file descriptor): probably not as that would result in EBADF errors, but I haven't seen them.
Increasing the maximum file descriptor limit. One user reported that this worked for him, the rest reported that it did not.

What else can possibly cause behavior like this? I know for certain that neither A nor B close() the socket prematurely, and I know for certain that neither of them have crashed because both A and B were able to report the error. It is as if the kernel suddenly decided to pull the plug from the socket for some reason.

user206268 · Accepted Answer

Perhaps you could try strace as described in: http://modperlbook.org/html/6-9-1-Detecting-Aborted-Connections.html

I assume that your problem is related to the one described here: http://blog.netherlabs.nl/articles/2009/01/18/the-ultimate-so_linger-page-or-why-is-my-tcp-not-reliable

Unfortunately I'm having a similar problem myself but couldn't manage to get it fixed with the given advices. However, perhaps that SO_LINGER thing works for you.

mark4o · Answer

shutdown() may have been called on one of the socket endpoints.
If either side may fork and execute a child process, ensure that the FD_CLOEXEC (close-on-exec) flag is set on the socket file descriptor if you did not intend for it to be inherited by the child. Otherwise the child process could (accidentally or otherwise) be manipulating your socket connection.

What can cause a spontaneous EPIPE error without either end calling close() or crashing?

Tags:

unix

posix

sockets

ipc

Hongli

2 Answers

user206268

mark4o

Recent Activity

Donate For Us

What can cause a spontaneous EPIPE error without either end calling close() or crashing?

Tags:

unix

posix

sockets

ipc

Hongli

2 Answers

user206268

mark4o

Related questions

Recent Activity

Donate For Us