Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

multiprocessing.dummy in Python is not utilising 100% cpu

I am doing a machine learning project in Python, so I have to do parallel predict function, which I'm using in my program.

from multiprocessing.dummy import Pool from multiprocessing import cpu_count   def multi_predict(X, predict, *args, **kwargs):     pool = Pool(cpu_count())     results = pool.map(predict, X)     pool.close()     pool.join()     return results 

The problem is that all my CPUs loaded only on 20-40% (in sum it's 100%). I use multiprocessing.dummy because I have some problems with multiprocessing module in pickling function.

like image 454
Demyanov Avatar asked Oct 17 '14 19:10

Demyanov


People also ask

Which is better multithreading or multiprocessing Python?

If your program is IO-bound, both multithreading and multiprocessing in Python will work smoothly. However, If the code is CPU-bound and your machine has multiple cores, multiprocessing would be a better choice.

Does Python multiprocessing use multiple cores?

Key Takeaways. Python is NOT a single-threaded language. Python processes typically use a single thread because of the GIL. Despite the GIL, libraries that perform computationally heavy tasks like numpy, scipy and pytorch utilise C-based implementations under the hood, allowing the use of multiple cores.

Is multiprocessing faster than multithreading?

2-Use Cases for Multiprocessing: Multiprocessing outshines threading in cases where the program is CPU intensive and doesn't have to do any IO or user interaction. Show activity on this post. Process may have multiple threads. These threads may share memory and are the units of execution within a process.

Is multiprocessing faster than multithreading in Python?

But the creation of processes itself is a CPU heavy task and requires more time than the creation of threads. Also, processes require more resources than threads. Hence, it is always better to have multiprocessing as the second option for IO-bound tasks, with multithreading being the first.


1 Answers

When you use multiprocessing.dummy, you're using threads, not processes:

multiprocessing.dummy replicates the API of multiprocessing but is no more than a wrapper around the threading module.

That means you're restricted by the Global Interpreter Lock (GIL), and only one thread can actually execute CPU-bound operations at a time. That's going to keep you from fully utilizing your CPUs. If you want get full parallelism across all available cores, you're going to need to address the pickling issue you're hitting with multiprocessing.Pool.

Note that multiprocessing.dummy might still be useful if the work you need to parallelize is IO bound, or utilizes a C-extension that releases the GIL. For pure Python code, however, you'll need multiprocessing.

like image 147
dano Avatar answered Sep 19 '22 06:09

dano