Understanding the usage of cpu cores of the multiprocessing module

Tags:

I have a simple main() function that processes a huge amount of data. Since I have an 8-Core machine with lots of ram I was suggested to use the multiprocessing module of python to accelerate the processing. Each subprocess will take about 18 hours to finish.

Long story short, I have doubts that I understood the behaviour of the multiprocessing module correctly.

I somehow start the different subprocesses like this:

def main():
    data = huge_amount_of_data().
    pool = multiprocessing.Pool(processes=cpu_cores) # cpu_cores is set to 8, since my cpu has 8 cores.
    pool.map(start_process, data_chunk) # data_chunk is a subset data.

I understand that starting this script is a process of its own, namely the main process that finishes after all the subprocesses are finished. Obviously the Main process does not eat much resources, since it will only prepare the data at first and spawn the subprocesses. Will it use a core for its own, too? Meaning will only be able to start 7 subprocesses instead of the 8 I liked to start above?

The core question is: Can I spawn 8 subprocesses and be sure, that they will work correctly parallel to each other?

By the way, the subprocesses do not interact in any way with each other and when they are finished, they each generate an sqlite database file where they store the results. So even the result_storage is handled separately.

What I want to avoid, is that I spawn a process who will hinder the others to run at full speed. I need the code to terminate in the approximated 16 hours and not in double of the time, because I have more processes then cores. :-)

578

asked Feb 26 '12 18:02

Aufwind

2 Answers

The OS will control which processes get assigned to which core, because there are other applications processes running you cannot guarantee that you have all the 8 cores available for your application.

The main thread will keep its own process, but because the map() function is blocked, the process is likely to be also blocked, not using any CPU core.

answered Sep 21 '22 00:09

João Pinto

As an aside, if you create a Pool without arguments, if will deduce the number of available cores automatically, using the result of cpu_count().

On any modern multitasking OS, no single program will generally be able to keep a core occupied and not allow other programs to run on it.

How many workers you should start depends on the characteristics of your start_process function. The number of cores isn't the only consideration.

If each worker process uses e.g. 1/4 of the available memory, starting more than 3 will lead to lots of swapping and a general slowdown. This condition is called "memory bound".

If the worker processes do other things than just calulations (e.g. read from or write to disk) they will have to wait a lot (since a disk is a lot slower than RAM; this is called "IO bound"). It might be worthwhile in that case to start more than one worker per core.

If the workers are not memory-bound or IO-bound, they will be bounded by the number of cores.

answered Sep 21 '22 00:09

Roland Smith

Related questions
                            
                                3D plots using maplot3d from matplotlib-
                            
                                random.shuffle Randomness
                            
                                Python Cartesian Product [duplicate]
                            
                                Mongodb with Python's "set()" type
                            
                                Wrapper to write to multiple streams
                            
                                Using OAuth Python service with Google App Engine
                            
                                any met python import paramiko and Crypto err like "Not using mpz_powm_sec."?
                            
                                Print executed command for Python subprocess.Popen
                            
                                Does Ideone support Python command line parameters?
                            
                                How to install/use cx_Oracle in PyPy
                            
                                System PIP instead of virtualenv PIP by default?
                            
                                Python, lxml and removing outer tag from using lxml.html.tostring(el)
                            
                                How can I use numpy to calculate a series effectively?
                            
                                python aes encrypt/decrypt does not return the same results
                            
                                Short Time Fourier Transform in python
                            
                                Matplotlib contour plot with intersecting contour lines
                            
                                How to plot precision and recall of multiclass classifier?
                            
                                Open Source Software For Transcribing Speech in Audio Files
                            
                                Get immediate parent tag with BeautifulSoup in Python
                            
                                Difference between static STATIC_URL and STATIC_ROOT on Django

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Understanding the usage of cpu cores of the multiprocessing module

Tags:

python

multiprocessing

cpu-usage

Aufwind

People also ask

2 Answers

João Pinto

Roland Smith

Recent Activity

Donate For Us