parallel dask for loop slower than regular loop?

Tags:

If I try to parallelize a for loop with dask, it ends up executing slower than the regular version. Basically, I just follow the introductory example from the dask tutorial, but for some reason it's failing on my end. What am I doing wrong?

In [1]: import numpy as np
   ...: from dask import delayed, compute
   ...: import dask.multiprocessing

In [2]: a10e4 = np.random.rand(10000, 11).astype(np.float16)
   ...: b10e4 = np.random.rand(10000, 11).astype(np.float16)

In [3]: def subtract(a, b):
   ...:     return a - b

In [4]: %%timeit
   ...: results = [subtract(a10e4, b10e4[index]) for index in range(len(b10e4))]
1 loop, best of 3: 10.6 s per loop

In [5]: %%timeit
   ...: values = [delayed(subtract)(a10e4, b10e4[index]) for index in range(len(b10e4)) ]
   ...: resultsDask = compute(*values, get=dask.multiprocessing.get)
1 loop, best of 3: 14.4 s per loop

569

asked Feb 12 '18 15:02

mistakeNot

1 Answers

Two issues:

Dask introduces about a millisecond of overhead per task. You'll want to ensure that your computations take significantly longer than that.
When using the multiprocessing scheduler data gets serialized between processes, which can be quite expensive. See http://dask.pydata.org/en/latest/setup.html

answered Sep 17 '22 03:09

MRocklin

Related questions
                            
                                How do I change font in Jupyter Notebook
                            
                                python: bookkeeping dependencies in cached attributes that might change
                            
                                Format OCR text annotation from Cloud Vision API in Python
                            
                                How to test redirection in Django using pytest?
                            
                                How do I unpack a SQL Server DATETIME in a pyodbc Output Converter function?
                            
                                create Pandas Dataframe with unique index
                            
                                Renaming a variable in a Python code in MS VS Code
                            
                                run async while loop independently
                            
                                ImportError: cannot import name '_remove_dead_weakref'
                            
                                Regex or split in python for shell awk equivalent
                            
                                Imbalanced Dataset Using Keras
                            
                                Bird's eye view perspective transformation from camera calibration opencv python
                            
                                Correct way to implement piecewise function in pandas / numpy
                            
                                Construct (N+1)-dimensional diagonal matrix from values in N-dimensional array
                            
                                pandas groupby and adding new column
                            
                                PyAutoGui - Press key for X seconds
                            
                                invalid group reference when using re.sub()
                            
                                Join related models in django rest framework
                            
                                Numpy print a 1d array as a column
                            
                                Change all column names in chained operation

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

parallel dask for loop slower than regular loop?

Tags:

python

parallel-processing

numpy

dask

mistakeNot

People also ask

1 Answers

MRocklin

Recent Activity

Donate For Us