Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python multiprocessing: why are large chunksizes slower?

I've been profiling some code using Python's multiprocessing module (the 'job' function just squares the number).

data = range(100000000)
n=4
time1 = time.time()
processes = multiprocessing.Pool(processes=n)
results_list = processes.map(func=job, iterable=data, chunksize=10000)
processes.close()
time2 = time.time()
print(time2-time1)
print(results_list[0:10])

One thing I found odd is that the optimal chunksize appears to be around 10k elements - this took 16 seconds on my computer. If I increase the chunksize to 100k or 200k, then it slows to 20 seconds.

Could this difference be due to the amount of time required for pickling being longer for longer lists? A chunksize of 100 elements takes 62 seconds which I'm assuming is due to the extra time required to pass the chunks back and forth between different processes.

like image 208
Jack Simpson Avatar asked Nov 25 '16 06:11

Jack Simpson


People also ask

Why is multiprocessing slow in Python?

This is due to the Python GIL being the bottleneck preventing threads from running completely concurrently. The best possible CPU utilisation can be achieved by making use of the ProcessPoolExecutor or Process modules which circumvents the GIL and make code run more concurrently.

What is Chunksize in multiprocessing?

The “chunksize” is an argument specified in a function to the multiprocessing pool when issuing many tasks.

Does multiprocessing speed up Python?

It is used to significantly speed up your program, especially if it has a lot of CPU extensive tasks. In this case, multiple functions can run together because each one will use a different CPU core which in turn will improve the CPU utilization.

How does Python multiprocess work?

multiprocessing is a package that supports spawning processes using an API similar to the threading module. The multiprocessing package offers both local and remote concurrency, effectively side-stepping the Global Interpreter Lock by using subprocesses instead of threads.


1 Answers

About optimal chunksize:

  1. Having tons of small chunks would allow the 4 different workers to distribute the load more efficiently, thus smaller chunks would be desirable.
  2. In the other hand, context changes related to processes add an overhead everytime a new chunk has to be processed, so less amount of context changes and therefore less chunks are desirable.

As both rules want different aproaches, a point in the middle is the way to go, similar to a supply-demand chart.

like image 200
Adirio Avatar answered Oct 17 '22 16:10

Adirio