I'm just getting my feet wet with multiprocessing(and its totally awesome!), but I was wondering if there was any guidelines to selecting number of processes? Is it just based on number of cores on the server? Is it somehow based on the application your running(number of loops, how much cpu it uses,etc)? etc...how do I decide how many processes to spawn? Right now, I'm just guessing and add/removing processes but it would be great if there was some kind of guideline or best practice.
Another question, I know what happens if I add too few(program is slooow) but what if I add 'too many'?
Thanks!
If we are using the context manager to create the process pool so that it is automatically shutdown, then you can configure the number of processes in the same manner. The number of workers must be less than or equal to 61 if Windows is your operating system.
Meanwhile, you can get some of the benefits of multiprocessing without multiple cores. The main benefit—the reason the module was designed—is parallelism for speed. And obviously, without 4 cores, you aren't going to cut your time down to 25%.
multiprocessing is a package that supports spawning processes using an API similar to the threading module. The multiprocessing package offers both local and remote concurrency, effectively side-stepping the Global Interpreter Lock by using subprocesses instead of threads.
The possible start methods are 'fork', 'spawn' and 'forkserver'. On Windows only 'spawn' is available. On Unix 'fork' and 'spawn' are always supported, with 'fork' being the default.
If all of your threads/processes are indeed CPU-bound, you should run as many processes as the CPU reports cores. Due to HyperThreading, each physical CPU cores may be able to present multiple virtual cores. Call multiprocessing.cpu_count
to get the number of virtual cores.
If only p of 1 of your threads is CPU-bound, you can adjust that number by multiplying by p. For example, if half your processes are CPU-bound (p = 0.5) and you have two CPUs with 4 cores each and 2x HyperThreading, you should start 0.5 * 2 * 4 * 2 = 8 processes.
If you have too few process, your application will run slower than expected. If your application scales perfectly and is only CPU-bound (i.e. is 10 times faster when executed on 10 times the amount of cores), this means you the speed is slower in relation. For example, if your system calls for 8 processes, but you only initiate 4, you'll only use half of the processing capacity and take twice as long. Note that in practice, no application scales perfectly, but some (ray tracing, video encoding) are pretty close.
If you have too many processes, the synchronization overhead will increase. If your program is little to none synchronization overhead, this won't impact the overall runtime, but may make other programs appear slower than they are unless you set your processes to a lower priority. Excessive numbers of processes (say, 10000) are fine in theory if your OS has a good scheduler. In practice, virtually any synchronization will make the overhead unbearable.
If you're not sure whether your application is CPU-bound and/or perfectly scaling, simply observe system load with different thread counts. You want the system load to be slightly under 100%, or the more precise uptime to be the number of virtual cores.
It's definitely based on what the application does. If it's CPU-heavy, the number of cores is a sane starting point. If it's IO-heavy, mulitple processes won't help performance anyway. If it's mostly CPU with occasional IO (e.g. PNG optimisation), you can run a few processes more than the number of cores.
The only way to know for certain is to run your application with some realistic input and check the resource utilisation. If you have CPU time to spare, add more worker processes.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With