Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

multiprocessing not using all cores

I wrote a sample script, and am having issues after reinstalling Ubuntu 20.04. It appears that multiprocessing is only using a single core. Here is my sample script:

import random
from multiprocessing import Pool, cpu_count

def f(x): return x*x

if __name__ == '__main__':
    with Pool(32) as p:
        print(p.imap(f,random.sample(range(10, 99999999), 50000000)))

And and image of my processing is below. Any idea what might cause this?

enter image description here

like image 495
negfrequency Avatar asked Jan 25 '23 02:01

negfrequency


1 Answers

The Pool of workers is an effective design pattern when your job can be split into separate units of works which can be distributed among multiple workers.

To do so, you need to divide your input in chunks and distribute these chunks via some means to all the workers. The multiprocessing.Pool uses OS processes for workers and a single OS pipe as transport layer.

This introduces a significant overhead which is often referred as Inter Process Communication (IPC) cost.

In your specific example, you generate in the main process a large dataset using the random.sample function. This alone takes quite a lot of resources. Then, you send each and every sample to a separate process which does a very trivial computation.

Needless to say, most of the time is spent in the main process which has to generate a large set of data, divide it in chunks of size 1 (as this is the default value for pool.imap) send each and every chunk to the workers and collect the returned values. All the worker processes are basically idling waiting for the main one to feed them work.

If you try to simulate some computation on your function f, you will notice how all cores become busy.

like image 52
noxdafox Avatar answered Jan 27 '23 17:01

noxdafox