Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Dask delayed object of unspecified length not iterable error when combining dictionaries

I'm trying to construct a dictionary in parallel using dask, but I'm running into a TypeError: Delayed objects of unspecified length are not iterable.

I'm trying to compute add, subtract, and multiply at the same time so the dictionary is constructed faster.

Here is some code that is representative of my use case:

import dask
from dask.delayed import delayed

x1 = {'a': 1, 'b': 2, 'c': 3}
x2 = {'a': 4, 'b': 5, 'c': 6}

@delayed
def add(d1, d2):
    z = {}
    z['func1_a'] = d1['a'] + d2['a']
    z['func1_b'] = d1['b'] + d2['b']
    z['func1_c'] = d1['c'] + d2['c']
    return z

@delayed
def subtract(d1, d2):
    z = {}
    z['func2_a'] = d1['a'] - d2['a']
    z['func2_b'] = d1['b'] - d2['b']
    z['func2_c'] = d1['c'] - d2['c']
    return z

@delayed
def multiply(d1, d2):
    z = {}
    z['func3_a'] = d1['a'] * d2['a']
    z['func3_b'] = d1['b'] * d2['b']
    z['func3_c'] = d1['c'] * d2['c']
    return z

@delayed
def last_step(d1, d2):
    z = {}
    z.update(add(d1, d2))
    z.update(subtract(d1, d2))
    z.update(multiply(d1, d2))
    return z

Finally, when I run:

>>> dask.compute(last_step(x1, x2))

<ipython-input-6-1153797c9d18> in final(d1, d2)
      2 def last_step(d1, d2):
      3     z = {}
----> 4     z.update(add(d1, d2))
      5     z.update(subtract(d1, d2))
      6     z.update(multiply(d1, d2))

/Users/me/anaconda3/lib/python3.6/site-packages/dask/delayed.py in __iter__(self)
    409     def __iter__(self):
    410         if getattr(self, '_length', None) is None:
--> 411             raise TypeError("Delayed objects of unspecified length are "
    412                             "not iterable")
    413         for i in range(self._length):

TypeError: Delayed objects of unspecified length are not iterable

What am I doing wrong here / failing to understand?

like image 733
blahblahblah Avatar asked Feb 11 '18 04:02

blahblahblah


1 Answers

Your z object is a dictionary while the result of add is a Delayed object. The update method expects a dictionary. There is no way for Python to know how to update one dict with an object-that-will-become-a-dict-when-you-call-compute.

In this case I recommend making z into a delayed object

# z = {}
z = delayed({})

z will stop being a dict, it'll be a thing-that-will-become-a-dict. This means that you can no longer check for keys, insert into it with getitem syntax, do any mutable operation like .update, etc, but can still call delayed methods on it. In your case I might use a pure function like toolz.merge

Just to set expectations. Dask.delayed doesn't parallelize totally arbitrary Python code. You still need to do some work to think about when objects are delayed and what are concrete.

like image 162
MRocklin Avatar answered Oct 27 '22 08:10

MRocklin