How do I actually get dask to compute a list of delayed or dask-container-based results?

Tags:

I have a trivially parallelizable task of computing results independently for many tables split across many files. I can construct delayed or dask.dataframe lists (and have also tried with, e.g. a dict), and I cannot get all of the results to compute (I can get individual results from a dask graph style dictionary using .get(), but again can't compute all results easily). Here's a minimal example:

>>> df = dd.from_pandas(pd.DataFrame({'a': [1,2]}), npartitions=1)
>>> numbers = [df['a'].mean() for _ in range(2)]
>>> dd.compute(numbers)
([<dask.dataframe.core.Scalar at 0x7f91d1523978>,
  <dask.dataframe.core.Scalar at 0x7f91d1523a58>],)

Similarly:

>>> from dask import delayed
>>> @delayed
... def mean(data):
...     sum(data) / len(data)
>>> delayed_numbers = [mean([1,2]) for _ in range(2)]
>>> dask.compute(delayed_numbers)
([Delayed('mean-0e0a0dea-fa92-470d-b06e-b639fbaacae3'),
  Delayed('mean-89f2e361-03b6-4279-bef7-572ceac76324')],)

I would like to get [3, 3], which is what I would expect based on the delayed collections docs.

For my real problem, I would actually like to compute on tables in an HDF5 file, but given that I can get that to work with dask.get() I'm pretty sure I'm specifying my deferred / dask dataframe step right already.

I would be interested in a solution that directly results in a dictionary, but I can also just return a list of (key, value) tuples to dict(), which is probably not a huge performance hit.

917

asked May 24 '16 00:05

Dav Clark

1 Answers

Compute takes many collections as separate arguments. Try splatting out your arguments as follows:

In [1]: import dask.dataframe as dd

In [2]: import pandas as pd

In [3]: df = dd.from_pandas(pd.DataFrame({'a': [1,2]}), npartitions=1)

In [4]: numbers = [df['a'].mean() for _ in range(2)]

In [5]: dd.compute(*numbers)  # note the *
Out[5]: (1.5, 1.5)

Or, as might be more common:

In [6]: dd.compute(df.a.mean(), df.a.std())
Out[6]: (1.5, 0.707107)

answered Oct 16 '22 08:10

MRocklin

Related questions
                            
                                how to set a column to DATE format in xlsxwriter
                            
                                Multi-Threaded NLP with Spacy pipe
                            
                                Sort a list in python based on another sorted list
                            
                                Finding email addresses in body using scrapy
                            
                                One liner to look up nested value from a dictionary python
                            
                                Running a python method/function directly from a file
                            
                                Passing columns to rows on python pandas
                            
                                How to mock in python and still allow the actual code of mocked function to execute
                            
                                How do I write this equation in Python?
                            
                                Deleting variable does not erase its memory from RAM memory
                            
                                How can I find the best fuzzy string match?
                            
                                Calculate the closest colourblind-friendly colour?
                            
                                How to remove a column from a structured numpy array *without copying it*?
                            
                                Keras, best way to save state when optimizing
                            
                                Customize JSON output in Django Rest Framework GET call
                            
                                Django queryset group by and count distincts
                            
                                Why is numba throwing an error regarding numpy methods when (nopython=True)?
                            
                                Cross-platform, safe to use command line string separator
                            
                                Python - List comprehension with tuple unpack
                            
                                Python3 append value to array but only if it's not None

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How do I actually get dask to compute a list of delayed or dask-container-based results?

Tags:

python

dask

Dav Clark

People also ask

1 Answers

MRocklin

Recent Activity

Donate For Us