I'm trying to construct a dictionary in parallel using dask, but I'm running into a TypeError: Delayed objects of unspecified length are not iterable
.
I'm trying to compute add
, subtract
, and multiply
at the same time so the dictionary is constructed faster.
Here is some code that is representative of my use case:
import dask
from dask.delayed import delayed
x1 = {'a': 1, 'b': 2, 'c': 3}
x2 = {'a': 4, 'b': 5, 'c': 6}
@delayed
def add(d1, d2):
z = {}
z['func1_a'] = d1['a'] + d2['a']
z['func1_b'] = d1['b'] + d2['b']
z['func1_c'] = d1['c'] + d2['c']
return z
@delayed
def subtract(d1, d2):
z = {}
z['func2_a'] = d1['a'] - d2['a']
z['func2_b'] = d1['b'] - d2['b']
z['func2_c'] = d1['c'] - d2['c']
return z
@delayed
def multiply(d1, d2):
z = {}
z['func3_a'] = d1['a'] * d2['a']
z['func3_b'] = d1['b'] * d2['b']
z['func3_c'] = d1['c'] * d2['c']
return z
@delayed
def last_step(d1, d2):
z = {}
z.update(add(d1, d2))
z.update(subtract(d1, d2))
z.update(multiply(d1, d2))
return z
Finally, when I run:
>>> dask.compute(last_step(x1, x2))
<ipython-input-6-1153797c9d18> in final(d1, d2)
2 def last_step(d1, d2):
3 z = {}
----> 4 z.update(add(d1, d2))
5 z.update(subtract(d1, d2))
6 z.update(multiply(d1, d2))
/Users/me/anaconda3/lib/python3.6/site-packages/dask/delayed.py in __iter__(self)
409 def __iter__(self):
410 if getattr(self, '_length', None) is None:
--> 411 raise TypeError("Delayed objects of unspecified length are "
412 "not iterable")
413 for i in range(self._length):
TypeError: Delayed objects of unspecified length are not iterable
What am I doing wrong here / failing to understand?
Your z
object is a dictionary while the result of add
is a Delayed object. The update method expects a dictionary. There is no way for Python to know how to update one dict with an object-that-will-become-a-dict-when-you-call-compute.
In this case I recommend making z
into a delayed object
# z = {}
z = delayed({})
z will stop being a dict, it'll be a thing-that-will-become-a-dict. This means that you can no longer check for keys, insert into it with getitem syntax, do any mutable operation like .update
, etc, but can still call delayed methods on it. In your case I might use a pure function like toolz.merge
Just to set expectations. Dask.delayed doesn't parallelize totally arbitrary Python code. You still need to do some work to think about when objects are delayed and what are concrete.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With