Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Dask: delayed vs futures and task graph generation [closed]

I have a few basic questions on Dask:

  1. Is it correct that I have to use Futures when I want to use dask for distributed computations (i.e. on a cluster)?
  2. In that case, i.e. when working with futures, are task graphs still the way to reason about computations. If yes, how do I create them.
  3. How can I generally, i.e. no matter if working with a future or with a delayed, get the dictionary associated with a task graph?

As an edit: My application is that I want to parallelize a for loop either on my local machine or on a cluster (i.e. it should work on a cluster).

As a second edit: I think I am also somewhat unclear regarding the relation between Futures and delayed computations.

Thx

like image 916
clog14 Avatar asked Jan 17 '19 08:01

clog14


People also ask

What does delayed mean in Dask?

The Dask delayed function decorates your functions so that they operate lazily. Rather than executing your function immediately, it will defer execution, placing the function and its arguments into a task graph. delayed ([obj, name, pure, nout, traverse]) Wraps a function or object to produce a Delayed .

What are futures in Dask?

Dask supports a real-time task framework that extends Python's concurrent. futures interface. Dask futures reimplements most of the Python futures API, allowing you to scale your Python futures workflow across a Dask cluster with minimal code changes.

How do I stop Dask client?

When we create a Client object it registers itself as the default Dask scheduler. All . compute() methods will automatically start using the distributed system. We can stop this behavior by using the set_as_default=False keyword argument when starting the Client.


1 Answers

1) Yup. If you're sending the data through a network, you have to have some way of asking the computer doing the computing for you how's that number-crunching coming along, and Futures represent more or less exactly that.

2) No. With Futures, you're executing the functions eagerly - spinning up the computations as soon as you can, then waiting for the results to come back (from another thread/process locally, or from some remote you've offloaded the job onto). The relevant abstraction here would be a Queque (Priority Queque, specifically).

3) For a Delayed instance, for instance, you could do some_delayed.dask, or for an Array, Array.dask; optionally wrap the whole thing in either dict() or vars(). I don't know for sure if it's reliably set up this way for every single API, though (I would assume so, but you know what they say about what assuming makes of the two of us...).

4) The simplest analogy would probably be: Delayed is essentially a fancy Python yield wrapper over a function; Future is essentially a fancy async/await wrapper over a function.

like image 129
jkm Avatar answered Oct 04 '22 19:10

jkm