Python multiprocessing.Pool() doesn't use 100% of each CPU

Tags:

I am working on multiprocessing in Python. For example, consider the example given in the Python multiprocessing documentation (I have changed 100 to 1000000 in the example, just to consume more time). When I run this, I do see that Pool() is using all the 4 processes but I don't see each CPU moving upto 100%. How to achieve the usage of each CPU by 100%?

from multiprocessing import Pool

def f(x):
    return x*x

if __name__ == '__main__':
    pool = Pool(processes=4)            
    result = pool.map(f, range(10000000))

906

asked Jan 25 '14 09:01

geekygeek

2 Answers

It is because multiprocessing requires interprocess communication between the main process and the worker processes behind the scene, and the communication overhead took more (wall-clock) time than the "actual" computation (x * x) in your case.

Try "heavier" computation kernel instead, like

def f(x):
  return reduce(lambda a, b: math.log(a+b), xrange(10**5), x)

Update (clarification)

I pointed out that the low CPU usage observed by the OP was due to the IPC overhead inherent in multiprocessing but the OP didn't need to worry about it too much because the original computation kernel was way too "light" to be used as a benchmark. In other words, multiprocessing works the worst with such a way too "light" kernel. If the OP implements a real-world logic (which, I'm sure, will be somewhat "heavier" than x * x) on top of multiprocessing, the OP will achieve a decent efficiency, I assure. My argument is backed up by an experiment with the "heavy" kernel I presented.

@FilipMalczak, I hope my clarification makes sense to you.

By the way there are some ways to improve the efficiency of x * x while using multiprocessing. For example, we can combine 1,000 jobs into one before we submit it to Pool unless we are required to solve each job in real time (ie. if you implement a REST API server, we shouldn't do in this way).

answered Sep 25 '22 00:09

nodakai

You're asking wrong kind of question. multiprocessing.Process represents process as understood in your operating system. multiprocessing.Pool is just a simple way to run several processes to do your work. Python environment has nothing to do with balancing load on cores/processors.

If you want to control how will processor time be given to processes, you should try tweaking your OS, not python interpreter.

Of course, "heavier" computations will be recognised by system, and may look like they do just what you want to do, but in fact, you have almost no control on process handling.

"Heavier" functions will just look heavier to your OS, and his usual reaction will be assigning more processor time to your processes, but that doesn't mean you did what you wanted to - and that's good, because that the whole point of languages with VM - you specify logic, and VM takes care of mapping this logic onto operating system.

answered Sep 22 '22 00:09

Filip Malczak

Related questions
                            
                                How to check immutability [duplicate]
                            
                                What is the most effective way to incremente a large number of values in Python?
                            
                                Is there an alternative of RewriteRule / .htaccess for a Python http.server.HTTPServer?
                            
                                Import python module over the internet/multiple protocols or dynamically create module
                            
                                Python:Let Python int overflow like C int [duplicate]
                            
                                Python equivalent of Mathematica's Sow/Reap
                            
                                Source code being exposed by AWS Elastic Beanstalk
                            
                                How to change dtype of one column in DataFrame?
                            
                                Use Python to Write VBA Script?
                            
                                python raw socket: Protocol not supported
                            
                                Consistently getting ImportError: Could not import settings 'myapp.settings' error
                            
                                Difference between setup.py install and setup.py develop
                            
                                What's the correct None or null entry for a datetime.datetime object in Python?
                            
                                How to force larger steps on scipy.optimize functions?
                            
                                How to make my python integration faster?
                            
                                How to find outliers in a series, vectorized?
                            
                                Explode multiple slices of pie together in matplotlib
                            
                                Open more than 1 file using with - python
                            
                                Python Zen - (only) one way to do it [duplicate]
                            
                                Getting the current shell type from python script

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Python multiprocessing.Pool() doesn't use 100% of each CPU

Tags:

python

multiprocessing

pool

cpu-usage