Context
I need to run a multiprocessing.Process inside a multiprocessing.ThreadPool. It seems weird at first but it is the only way I found to deals with segfault that could occurs because I am using a c++ shared library. If a segfault append, the process is killed and I can check the process.exitcode and deal with that.
Problem
After a while, a deadlock append when I am trying to join the process.
Here is a simple version a my code:
import sys, time, multiprocessing
from multiprocessing.pool import ThreadPool
def main():
# Launch 8 workers
pool = ThreadPool(8)
it = pool.imap(run, range(500))
while True:
try:
it.next()
except StopIteration:
break
def run(value):
# Each worker launch it own Process
process = multiprocessing.Process(target=run_and_might_segfault, args=(value,))
process.start()
while process.is_alive():
sys.stdout.write('.')
sys.stdout.flush()
time.sleep(0.1)
# Will never join after a while, because of a mystery deadlock
process.join()
# Deals with process.exitcode to log errors
def run_and_might_segfault(value):
# Load a shared library and do stuff (could throw c++ exception, segfault ...)
print(value)
if __name__ == '__main__':
main()
And here is a possible output:
➜ ~ python m.py
..0
1
........8
.9
.......10
......11
........12
13
........14
........16
........................................................................................
As you can see, process.is_alive()
is alway true after few iterations, the process will never join.
If I CTRL-C the script a get this stacktrace:
Traceback (most recent call last):
File "/usr/local/Cellar/python3/3.5.1/Frameworks/Python.framework/Versions/3.5/lib/python3.5/multiprocessing/pool.py", line 680, in next
item = self._items.popleft()
IndexError: pop from an empty deque
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "m.py", line 30, in <module>
main()
File "m.py", line 9, in main
it.next()
File "/usr/local/Cellar/python3/3.5.1/Frameworks/Python.framework/Versions/3.5 /lib/python3.5/multiprocessing/pool.py", line 684, in next
self._cond.wait(timeout)
File "/usr/local/Cellar/python3/3.5.1/Frameworks/Python.framework/Versions/3.5 /lib/python3.5/threading.py", line 293, in wait
waiter.acquire()
KeyboardInterrupt
Error in atexit._run_exitfuncs:
Traceback (most recent call last):
File "/usr/local/Cellar/python3/3.5.1/Frameworks/Python.framework/Versions/3.5 /lib/python3.5/multiprocessing/popen_fork.py", line 29, in poll
pid, sts = os.waitpid(self.pid, flag)
KeyboardInterrupt
PS Using python 3.5.2 on macos.
Every kind of help is appreciate, thanks.
Edit
I tried using python 2.7, and it is working well. May be a python 3.5 issue only?
A deadlock is a condition that may happen in a system composed of multiple processes that can access shared resources. A deadlock is said to occur when two or more processes are waiting for each other to release a resource. None of the processes can make any progress.
Using a queue in multiprocessingOperations with a queue are process-safe. The multiprocessing Queue implements all the methods of queue. Queue except for task_done() and join() .
Value and multiprocessing.These shared objects will be process and thread-safe. This means that multiple processes may access and change the values of shared ctypes without fear of race conditions.
The problem is also reproduced on the latest build of CPython - Python 3.7.0a0 (default:4e2cce65e522, Oct 13 2016, 21:55:44)
.
If you attach to one of the stuck processes with gdb, you'll see that it's trying to acquire a lock in sys.stdout.flush()
call:
(gdb) py-list
263 import traceback
264 sys.stderr.write('Process %s:\n' % self.name)
265 traceback.print_exc()
266 finally:
267 util.info('process exiting with exitcode %d' % exitcode)
>268 sys.stdout.flush()
269 sys.stderr.flush()
270
271 return exitcode
Python level backtrace looks like this:
(gdb) py-bt
Traceback (most recent call first):
File "/home/rpodolyaka/src/cpython/Lib/multiprocessing/process.py", line 268, in _bootstrap
sys.stdout.flush()
File "/home/rpodolyaka/src/cpython/Lib/multiprocessing/popen_fork.py", line 74, in _launch
code = process_obj._bootstrap()
File "/home/rpodolyaka/src/cpython/Lib/multiprocessing/popen_fork.py", line 20, in __init__
self._launch(process_obj)
File "/home/rpodolyaka/src/cpython/Lib/multiprocessing/context.py", line 277, in _Popen
return Popen(process_obj)
File "/home/rpodolyaka/src/cpython/Lib/multiprocessing/context.py", line 223, in _Popen
return _default_context.get_context().Process._Popen(process_obj)
File "/home/rpodolyaka/src/cpython/Lib/multiprocessing/process.py", line 105, in start
self._popen = self._Popen(self)
File "deadlock.py", line 17, in run
process.start()
File "/home/rpodolyaka/src/cpython/Lib/multiprocessing/pool.py", line 119, in worker
result = (True, func(*args, **kwds))
File "/home/rpodolyaka/src/cpython/Lib/threading.py", line 864, in run
self._target(*self._args, **self._kwargs)
File "/home/rpodolyaka/src/cpython/Lib/threading.py", line 916, in _bootstrap_inner
self.run()
File "/home/rpodolyaka/src/cpython/Lib/threading.py", line 884, in _bootstrap
self._bootstrap_inner()
At the interpreter level it looks like:
(gdb) frame 6
(gdb) list
287 return 0;
288 }
289 relax_locking = (_Py_Finalizing != NULL);
290 Py_BEGIN_ALLOW_THREADS
291 if (!relax_locking)
292 st = PyThread_acquire_lock(self->lock, 1);
293 else {
294 /* When finalizing, we don't want a deadlock to happen with daemon
295 * threads abruptly shut down while they owned the lock.
296 * Therefore, only wait for a grace period (1 s.). ... */
(gdb) p /x self->lock
$1 = 0xd25ce0
(gdb) p /x self->owner
$2 = 0x7f9bb2128700
Note, that from the point of view of this particular child process the lock is still owned by one of threads in the parent process (LWP 1105
):
(gdb) info threads
Id Target Id Frame
* 1 Thread 0x7f9bb5559440 (LWP 1102) "python" 0x00007f9bb5157577 in futex_abstimed_wait_cancelable (private=0, abstime=0x0, expected=0,
futex_word=0xe4d340) at ../sysdeps/unix/sysv/linux/futex-internal.h:205
2 Thread 0x7f9bb312a700 (LWP 1103) "python" 0x00007f9bb4780253 in select () at ../sysdeps/unix/syscall-template.S:84
3 Thread 0x7f9bb2929700 (LWP 1104) "python" 0x00007f9bb4780253 in select () at ../sysdeps/unix/syscall-template.S:84
4 Thread 0x7f9bb2128700 (LWP 1105) "python" 0x00007f9bb4780253 in select () at ../sysdeps/unix/syscall-template.S:84
5 Thread 0x7f9bb1927700 (LWP 1106) "python" 0x00007f9bb4780253 in select () at ../sysdeps/unix/syscall-template.S:84
6 Thread 0x7f9bb1126700 (LWP 1107) "python" 0x00007f9bb4780253 in select () at ../sysdeps/unix/syscall-template.S:84
7 Thread 0x7f9bb0925700 (LWP 1108) "python" 0x00007f9bb4780253 in select () at ../sysdeps/unix/syscall-template.S:84
8 Thread 0x7f9b9bfff700 (LWP 1109) "python" 0x00007f9bb4780253 in select () at ../sysdeps/unix/syscall-template.S:84
9 Thread 0x7f9b9b7fe700 (LWP 1110) "python" 0x00007f9bb4780253 in select () at ../sysdeps/unix/syscall-template.S:84
10 Thread 0x7f9b9affd700 (LWP 1111) "python" 0x00007f9bb4780253 in select () at ../sysdeps/unix/syscall-template.S:84
11 Thread 0x7f9b9a7fc700 (LWP 1112) "python" 0x00007f9bb5157577 in futex_abstimed_wait_cancelable (private=0, abstime=0x0, expected=0,
futex_word=0x7f9b80001ed0) at ../sysdeps/unix/sysv/linux/futex-internal.h:205
12 Thread 0x7f9b99ffb700 (LWP 1113) "python" 0x00007f9bb5157577 in futex_abstimed_wait_cancelable (private=0, abstime=0x0, expected=0,
futex_word=0x7f9b84001bb0) at ../sysdeps/unix/sysv/linux/futex-internal.h:205
So it's indeed a deadlock and it happens due to the fact that you perform writes and flushing on sys.stdout
in multiple
threads concurrently in the original process while also creating subprocesses - by the nature of fork(2)
system call
children inherit the parent memory including acquired locks: fork()
calls must have been performed while the lock was acquired, and even when the parent process finally releases it, the children won't see that, as each of them now has its own memory space, that was copied on write.
Thus, you need to be very careful when mixing
multithreading with multiprocessing and make sure all the locks are properly released before fork()
, if they are to
be used in the children processes.
It's very similar to what is described in http://bugs.python.org/issue6721
Note, that if you remove the interactions with sys.stdout
from your snippet, it will work correctly.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With