Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Mystery pthread problem with fork()

I have a program which:

  • has a main thread (1) which starts a server thread (2) and another (4).
  • the server thread (2) does an accept(), then creates a new thread (3) to handle the connection.

At some point, thread (4) does a fork/exec to run another program which should connect to the socket that thread (2) is listening to. Occasionally this fails or takes an unreasonably long time, and it's extremely difficult to diagnose. If I strace the system, it appears that the fork/exec has worked, the accept has happened, the new thread (4) has been created .. but nothing happens in that thread (using strace -ff, the file for the relevant pid is blank).

Any ideas?

like image 597
pjc50 Avatar asked May 19 '26 23:05

pjc50


2 Answers

I came to the conclusion that it was probably this phenomenon:

http://kerneltrap.org/mailarchive/linux-kernel/2008/8/15/2950234/thread

as the bug is difficult to trigger on our development systems but is generally reported by users running on large shared machines; also the forked application starts a JVM, which itself allocates a lot of threads. The problem is also associated with the machine being loaded, and extensive memory usage (we have a machine with 128Gb of RAM and processes may be 10-100G in size).

I've been reading the O'Reilly pthreads book, which explains pthread_atfork(), and suggests the use of a "surrogate parent" process forked from the main process at startup from which subprocesses are run. It also suggests the use of a pre-created thread pool. Both of these seem like good ideas, so I'm going to implement at least one of them.

like image 136
pjc50 Avatar answered May 21 '26 14:05

pjc50


It's look like a deadlock condition. Look for blocking functions, like accept(), the problem should be there.


Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!