Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Status of mixing multiprocessing and threading in Python

Tags:

What are best practices or work-arounds for using both multiprocessing and user threads in the same python application in Linux with respect to Issue 6721, Locks in python standard library should be sanitized on fork?

Why do I need both? I use child processes to do heavy computation that produce data structure results that are much too large to return through a queue -- rather they must be immediately stored to disk. It seemed efficient to have each of these child processes monitored by a separate thread, so that when finished, the thread could handle the IO of reading the large (eg multi GB) data back into the process where the result was needed for further computation in combination with the results of other child processes. The children processes would intermittently hang, which I just (after much head pounding) found was 'caused' by using the logging module. Others have documented the problem here:

https://twiki.cern.ch/twiki/bin/view/Main/PythonLoggingThreadingMultiprocessingIntermixedStudy

which points to this apparently unsolved python issue: Locks in python standard library should be sanitized on fork; http://bugs.python.org/issue6721

Alarmed at the difficulty I had tracking this down, I answered:

Are there any reasons not to mix Multiprocessing and Threading module in Python

with the rather unhelpful suggestion to 'Be careful' and links to the above.

But the lengthy discussion re: Issue 6721 suggests that it is a 'bug' to use both multiprocessing (or os.fork) and user threads in the same application. With my limited understanding of the problem, I find too much disagreement in the discussion to conclude what are the work-arounds or strategies for using both multiprocessing and threading in the same application. My immediate problem was solved by disabling logging, but I create a small handful of other (explicit) locks in both parent and child processes, and suspect I am setting myself up for further intermittent deadlocks.

Can you give practical recommendations to avoid deadlocks while using locks and/or the logging module while using threading and multiprocessing in a python (2.7,3.2,3.3) application?

like image 340
ricopan Avatar asked Oct 20 '12 00:10

ricopan


People also ask

Does Python support multithreading and multiprocessing?

Both multithreading and multiprocessing allow Python code to run concurrently. Only multiprocessing will allow your code to be truly parallel. However, if your code is IO-heavy (like HTTP requests), then multithreading will still probably speed up your code.

Is multiprocessing faster than multithreading in Python?

Another use case for threading is programs that are IO bound or network bound, such as web-scrapers. 2-Use Cases for Multiprocessing: Multiprocessing outshines threading in cases where the program is CPU intensive and doesn't have to do any IO or user interaction.

Which is better multiprocessing or multithreading in Python?

The short answer is: Multithreading for I/O intensive tasks and; Multiprocessing for CPU intensive tasks (if you have multiple cores available)

What's the difference between Python threading and multiprocessing?

Multiprocessing uses two or more CPUs to increase computing power, whereas multithreading uses a single process with multiple code segments to increase computing power. Multithreading focuses on generating computing threads from a single process, whereas multiprocessing increases computing power by adding CPUs.


1 Answers

You will be safe if you fork off additional processes while you still have only one thread in your program (that is, fork from main thread, before spawning worker threads).

Your use case looks like you don't even need multiprocessing module; you can use subprocess (or even simpler os.system-like calls).

See also Is it safe to fork from within a thread?

like image 136
Igor Nazarenko Avatar answered Sep 20 '22 04:09

Igor Nazarenko