I have a few basic questions on Dask: <ol> <li>Is it correct that I have to use Futures when I want to use dask for distributed computations (i.e. on a cluster)?</li> <li>In that case, i.e. when working with futures, are task graphs still the way to reason about computations. If yes, how do I create them.</li> <li>How can I generally, i.e. no matter if working with a future or with a delayed, get the dictionary associated with a task graph?</li> </ol> As an edit: My application is that I want to parallelize a for loop either on my local machine or on a cluster (i.e. it should work on a cluster). As a second edit: I think I am also somewhat unclear regarding the relation between Futures and delayed computations. Thx

1) Yup. If you're sending the data through a network, you have to have some way of asking the computer doing the computing for you how's that number-crunching coming along, and Futures represent more or less exactly that. 2) No. With Futures, you're executing the functions eagerly - spinning up the computations as soon as you can, then waiting for the results to come back (from another thread/process locally, or from some remote you've offloaded the job onto). The relevant abstraction here would be a Queque (Priority Queque, specifically). 3) For a Delayed instance, for instance, you could do some_delayed.dask, or for an Array, Array.dask; optionally wrap the whole thing in either dict() or vars(). I don't know for sure if it's reliably set up this way for every single API, though (I would assume so, but you know what they say about what assuming makes of the two of us...). 4) The simplest analogy would probably be: Delayed is essentially a fancy Python <code>yield</code> wrapper over a function; Future is essentially a fancy <code>async/await</code> wrapper over a function.

Dask: delayed vs futures and task graph generation [closed]

1 Answers

1) Yup. If you're sending the data through a network, you have to have some way of asking the computer doing the computing for you how's that number-crunching coming along, and Futures represent more or less exactly that.

2) No. With Futures, you're executing the functions eagerly - spinning up the computations as soon as you can, then waiting for the results to come back (from another thread/process locally, or from some remote you've offloaded the job onto). The relevant abstraction here would be a Queque (Priority Queque, specifically).

3) For a Delayed instance, for instance, you could do some_delayed.dask, or for an Array, Array.dask; optionally wrap the whole thing in either dict() or vars(). I don't know for sure if it's reliably set up this way for every single API, though (I would assume so, but you know what they say about what assuming makes of the two of us...).

4) The simplest analogy would probably be: Delayed is essentially a fancy Python yield wrapper over a function; Future is essentially a fancy async/await wrapper over a function.

129

answered Oct 04 '22 19:10

jkm

Related questions
                            
                                Oversampling functionality in Tensorflow dataset API
                            
                                How to pip install *.whl on Windows (using a wildcard)
                            
                                Mask out sensitive information in python log
                            
                                Print all columns and rows of a numpy array [duplicate]
                            
                                Django save previous object from models
                            
                                Randomly sample from multiple tf.data.Datasets in Tensorflow
                            
                                What's the right way to insert a CalibratedClassifierCV in a scikit-learn pipeline?
                            
                                Drop duplicates keeping the row with the highest value in another column
                            
                                Python memory not being released on linux?
                            
                                How are Counter / defaultdict ordered in Python 3.7?
                            
                                Pandas dataframe drop columns with no header
                            
                                Engines in Python Pandas read_csv
                            
                                Why do dict_items objects not support indexing?
                            
                                pytorch freeze weights and update param_groups
                            
                                Is it bad practice to have arguments called in main() in Python
                            
                                How can I go to a function's definition in Jupyter notebook?
                            
                                Plot data on satellite maps
                            
                                PRAW 6: Get all submission of a subreddit
                            
                                How to convert pendulum to datetime.datetime type?
                            
                                Jupyterlab and Plotly offline: requirejs is not defined

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Dask: delayed vs futures and task graph generation [closed]

Tags:

python

distributed-computing

dask

clog14

People also ask

1 Answers

jkm

Recent Activity

Donate For Us