I've been profiling some code using Python's multiprocessing module (the 'job' function just squares the number).
data = range(100000000)
n=4
time1 = time.time()
processes = multiprocessing.Pool(processes=n)
results_list = processes.map(func=job, iterable=data, chunksize=10000)
processes.close()
time2 = time.time()
print(time2-time1)
print(results_list[0:10])
One thing I found odd is that the optimal chunksize appears to be around 10k elements - this took 16 seconds on my computer. If I increase the chunksize to 100k or 200k, then it slows to 20 seconds.
Could this difference be due to the amount of time required for pickling being longer for longer lists? A chunksize of 100 elements takes 62 seconds which I'm assuming is due to the extra time required to pass the chunks back and forth between different processes.
This is due to the Python GIL being the bottleneck preventing threads from running completely concurrently. The best possible CPU utilisation can be achieved by making use of the ProcessPoolExecutor or Process modules which circumvents the GIL and make code run more concurrently.
The “chunksize” is an argument specified in a function to the multiprocessing pool when issuing many tasks.
It is used to significantly speed up your program, especially if it has a lot of CPU extensive tasks. In this case, multiple functions can run together because each one will use a different CPU core which in turn will improve the CPU utilization.
multiprocessing is a package that supports spawning processes using an API similar to the threading module. The multiprocessing package offers both local and remote concurrency, effectively side-stepping the Global Interpreter Lock by using subprocesses instead of threads.
About optimal chunksize:
As both rules want different aproaches, a point in the middle is the way to go, similar to a supply-demand chart.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With