Python multiprocessing: why are large chunksizes slower?

Tags:

I've been profiling some code using Python's multiprocessing module (the 'job' function just squares the number).

data = range(100000000)
n=4
time1 = time.time()
processes = multiprocessing.Pool(processes=n)
results_list = processes.map(func=job, iterable=data, chunksize=10000)
processes.close()
time2 = time.time()
print(time2-time1)
print(results_list[0:10])

One thing I found odd is that the optimal chunksize appears to be around 10k elements - this took 16 seconds on my computer. If I increase the chunksize to 100k or 200k, then it slows to 20 seconds.

Could this difference be due to the amount of time required for pickling being longer for longer lists? A chunksize of 100 elements takes 62 seconds which I'm assuming is due to the extra time required to pass the chunks back and forth between different processes.

208

asked Nov 25 '16 06:11

Jack Simpson

1 Answers

About optimal chunksize:

Having tons of small chunks would allow the 4 different workers to distribute the load more efficiently, thus smaller chunks would be desirable.
In the other hand, context changes related to processes add an overhead everytime a new chunk has to be processed, so less amount of context changes and therefore less chunks are desirable.

As both rules want different aproaches, a point in the middle is the way to go, similar to a supply-demand chart.

200

answered Oct 17 '22 16:10

Adirio

Related questions
                            
                                Using a variable for num_splits for tf.split()
                            
                                Assign specific colours to data in Matplotlib pie chart
                            
                                Downgrading Python Setuptools
                            
                                How to get the data for the edge between two nodes?
                            
                                How can I list video resolutions in youtube-dl while using it as a python module?
                            
                                How to use Python Elasticsearch mget() API
                            
                                Matplotlib, Pandas, Pie Chart Label mistakes
                            
                                Python elasticsearch-dsl django pagination
                            
                                Using os.path for POSIX Path Operations on Windows
                            
                                PyQt5: The DLL load failed: the specified module could not be found
                            
                                Getting ‘wrong’ page source when calling url from python
                            
                                ImportError: cannot import name COMError in python
                            
                                First argument of sys.path.insert in python
                            
                                In Python, is it possible to expose modules from subpackages at package level?
                            
                                Python Scrapy: TypeError: to_bytes must receive a unicode, str or bytes object, got int
                            
                                Tensorflow: How to write op with gradient in python?
                            
                                Convert Model.Objects.all() to JSON in python using django
                            
                                Pyspark: TaskMemoryManager: Failed to allocate a page: Need help in Error Analysis
                            
                                in Pandas, when using read_csv(), how to assign a NaN to a value that's not the dtype intended?
                            
                                Get Last Monday in Spark

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Python multiprocessing: why are large chunksizes slower?

Tags:

python

parallel-processing

multiprocessing

python-multiprocessing

Jack Simpson

People also ask

1 Answers

Adirio

Recent Activity

Donate For Us