Are there any guidelines to follow when choosing number of processes with multiprocessing?

Tags:

I'm just getting my feet wet with multiprocessing(and its totally awesome!), but I was wondering if there was any guidelines to selecting number of processes? Is it just based on number of cores on the server? Is it somehow based on the application your running(number of loops, how much cpu it uses,etc)? etc...how do I decide how many processes to spawn? Right now, I'm just guessing and add/removing processes but it would be great if there was some kind of guideline or best practice.

Another question, I know what happens if I add too few(program is slooow) but what if I add 'too many'?

Thanks!

730

asked Feb 20 '12 02:02

Lostsoul

2 Answers

If all of your threads/processes are indeed CPU-bound, you should run as many processes as the CPU reports cores. Due to HyperThreading, each physical CPU cores may be able to present multiple virtual cores. Call multiprocessing.cpu_count to get the number of virtual cores.

If only p of 1 of your threads is CPU-bound, you can adjust that number by multiplying by p. For example, if half your processes are CPU-bound (p = 0.5) and you have two CPUs with 4 cores each and 2x HyperThreading, you should start 0.5 * 2 * 4 * 2 = 8 processes.

If you have too few process, your application will run slower than expected. If your application scales perfectly and is only CPU-bound (i.e. is 10 times faster when executed on 10 times the amount of cores), this means you the speed is slower in relation. For example, if your system calls for 8 processes, but you only initiate 4, you'll only use half of the processing capacity and take twice as long. Note that in practice, no application scales perfectly, but some (ray tracing, video encoding) are pretty close.

If you have too many processes, the synchronization overhead will increase. If your program is little to none synchronization overhead, this won't impact the overall runtime, but may make other programs appear slower than they are unless you set your processes to a lower priority. Excessive numbers of processes (say, 10000) are fine in theory if your OS has a good scheduler. In practice, virtually any synchronization will make the overhead unbearable.

If you're not sure whether your application is CPU-bound and/or perfectly scaling, simply observe system load with different thread counts. You want the system load to be slightly under 100%, or the more precise uptime to be the number of virtual cores.

165

answered Oct 18 '22 03:10

phihag

It's definitely based on what the application does. If it's CPU-heavy, the number of cores is a sane starting point. If it's IO-heavy, mulitple processes won't help performance anyway. If it's mostly CPU with occasional IO (e.g. PNG optimisation), you can run a few processes more than the number of cores.

The only way to know for certain is to run your application with some realistic input and check the resource utilisation. If you have CPU time to spare, add more worker processes.

answered Oct 18 '22 03:10

millimoose

Related questions
                            
                                Failure to connect to Docker Postgresql instance from Python
                            
                                List all environment id in openai gym
                            
                                Display / Render an HTML file inside Jupyter Notebook on Google Colab platform
                            
                                Where is the tensorflow session in Keras
                            
                                Is it better to Keras fit_to_text on the entire x_data or just the train_data?
                            
                                Tf 2.0 : RuntimeError: GradientTape.gradient can only be called once on non-persistent tapes
                            
                                How to automate browser refresh when developing an Flask app with Python?
                            
                                Running google colab every day at a specific time
                            
                                Does re.compile() or any given Python library call throw an exception?
                            
                                Using the Python NLTK (2.0b5) on the Google App Engine
                            
                                How to organize Python modules for PyPI to support 2.x and 3.x
                            
                                Static memory in python: do loops create new instances of variables in memory?
                            
                                Does django with mongodb make migrations a thing of the past?
                            
                                Python: subprocess with different working directory [duplicate]
                            
                                Query language for python objects [closed]
                            
                                python: abstract base class' __init__(): initializion or validation? [closed]
                            
                                Python: Dictionary as instance variable [duplicate]
                            
                                Invoking a PowerShell script from Python
                            
                                Why does Python's urllib2.urlopen() raise an HTTPError for successful status codes?
                            
                                How to force numpy array order to fortran style?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Are there any guidelines to follow when choosing number of processes with multiprocessing?

Tags:

python

parallel-processing

multiprocessing

Lostsoul

People also ask

2 Answers

phihag

millimoose

Recent Activity

Donate For Us