Does Dask support functions with multiple outputs in Custom Graphs?

Tags:

dask

The Custom Graphs API of Dask seems to support only functions returning one output key/value.

For example, the following dependency could not be easily represented as a Dask graph:

    B -> D
   /      \
A-         -> F
   \      /
    C -> E

This can be worked around by storing a tuple under a "composite" key (e.g. "B_C" in this case) and then splitting it by getitem() or similar. However, that can lead to inefficient execution (e.g. unnecessary serialization) and reduce the clarity of DAG visualizations.

Is there a better way or is this currently not supported?

882

asked Jul 15 '16 22:07

Petr Wolf

1 Answers

Short answer

No, but it shouldn't matter.

Programming interface

You are correct that the correct way to manage multiple outputs with Dask is to use getitem. In terms of programming interface, the standard way to do this with dask.delayed is with getitem as you suggest. Here is an example:

from dask import delayed

@delayed(pure=True)
def minmax(a, b):
    if a > b:
        return a, b
    else:
        return b, a

result = minmax(1, 2)
min, max = result[0], result[1]

Performance

You raise an interesting question about performance. In practice using the distributed scheduler (which works just fine on a single machine) should handle this sort of situation just fine without performance penalty. The same would be true for the single-machine threaded scheduler.

180

answered Oct 22 '22 23:10

MRocklin

Related questions
                            
                                Multiple Linear Regression Model by using Tensorflow
                            
                                SyntaxNet creating tree to root verb
                            
                                Dedupe in Python
                            
                                Turn python script into a function
                            
                                How to configure the Jenkins ShiningPanda plugin Python Installations
                            
                                changing update rate with gpsd and python
                            
                                Is a constant list used in a loop constructed/deleted with each pass?
                            
                                How to call audio plugins from within Python?
                            
                                Unable to view files in a browser with python http server
                            
                                Python threading.Timer object not functioning when compiled to .exe
                            
                                Function which returns the least-squares solution to a linear matrix equation
                            
                                Get index of the minimum of multi-index Pandas DataFrame using level
                            
                                Using SSHTunnelForwarder to connect to a MySQL db via SSH
                            
                                Fast algorithm to find indices where multiple arrays have the same value
                            
                                Splitting a large Pandas Dataframe with minimal memory footprint
                            
                                How to build a sparse matrix in PySpark?
                            
                                How to make "conda" installer look for "PyPi" packages
                            
                                Optimizing Array Element Shifting in Python / Numpy
                            
                                Dynamic radio buttons from database query using Flask and WTForms
                            
                                Alternatives to numpy einsum

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With