Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python multiprocessing pool: maxtasksperchild

I have been dabbling with Python's multiprocessing library and although it provides an incredibly easy to use API, it's documentation is not always very clear. In particular, the argument 'maxtasksperchild' passed to an instance of the Pool class I find very confusing.

The following comes directly from Python's documentation (3.7.2):

maxtasksperchild is the number of tasks a worker process can complete before it will exit and be replaced with a fresh worker process, to enable unused resources to be freed. The default maxtasksperchild is None, which means worker processes will live as long as the pool.

The above raises more questions for me than it answers. Is it bad for a worker process to live as long as the pool? What makes a worker process 'fresh' and when is that desired? In general, when should you set the value for maxtasksperchild explicitly instead of letting it default to 'None' and what are considered best practices in order to maximize processing speed?

From @Darkonaut's amazing answer on chunksize I now understand what chunksize does and represents. Since supplying a value for chunksize impacts the number of 'tasks', I was wondering if there are any considerations that should be made regarding their dependence to ensure maximum performance?

Thanks!

like image 889
Marnix.hoh Avatar asked Mar 03 '19 23:03

Marnix.hoh


People also ask

How do you pass multiple arguments in multiprocessing Python?

Use Pool. The multiprocessing pool starmap() function will call the target function with multiple arguments. As such it can be used instead of the map() function. This is probably the preferred approach for executing a target function in the multiprocessing pool that takes multiple arguments.

What is pool in multiprocessing Python?

Python multiprocessing Pool can be used for parallel execution of a function across multiple input values, distributing the input data across processes (data parallelism). Below is a simple Python multiprocessing Pool example.

How many processes should be running Python multiprocessing?

If we are using the context manager to create the process pool so that it is automatically shutdown, then you can configure the number of processes in the same manner. The number of workers must be less than or equal to 61 if Windows is your operating system.

Is multiprocessing faster than multithreading?

Multiprocessing outshines threading in cases where the program is CPU intensive and doesn't have to do any IO or user interaction. For example, any program that just crunches numbers will see a massive speedup from multiprocessing; in fact, threading will probably slow it down.


1 Answers

Normally you don't need to touch this. Sometimes there can arise problems with code calling outside Python leaking memory for example. Limiting the number of tasks a worker-process does before he gets replaced then helps because the "unused resources" he erroneously accumulates are released when the process gets scrapped. Starting a new, "fresh" process then keeps the problem contained. Because replacing a process needs time, for performance you let maxtasksperchild at default. When you run into unexplainable resource problems some day, you can try setting maxtasksperchild=1 to see if this changes something. If it does, it's likely something is leaking something.

like image 58
Darkonaut Avatar answered Oct 12 '22 20:10

Darkonaut