Is it possible to create a dask array from a delayed value by specifying its shape with an other delayed value?
My algorithm won't give me the shape of the array until pretty late in the computation.
Eventually, I will be creating some blocks with shapes specified by the intermediate results of my computation, eventually calling da.concatenate
on all the results (well da.block
if it were more flexible)
I don't think it is too detrimental if I can't, but it would be cool if could.
Sample code
from dask import delayed
from dask import array as da
import numpy as np
n_shape = (3, 3)
shape = delayed(n_shape, nout=2)
d_shape = (delayed(n_shape[0]), delayed(n_shape[1]))
n = delayed(np.zeros)(n_shape, dtype=np.float)
# this doesn't work
# da.from_delayed(n, shape=shape, dtype=np.float)
# this doesn't work either, but I think goes a little deeper
# into the function call
da.from_delayed(n, shape=d_shape, dtype=np.float)
The Dask delayed function decorates your functions so that they operate lazily. Rather than executing your function immediately, it will defer execution, placing the function and its arguments into a task graph. Wraps a function or object to produce a Delayed .
Instead of executing a function for each item in the loop in a sequential manner, Dask Delayed allows multiple items to be processed in parallel. With Dask Delayed each function call is queued, added to an execution graph and scheduled.
The reason dask dataframe is taking more time to compute (shape or any operation) is because when a compute op is called, dask tries to perform operations from the creation of the current dataframe or it's ancestors to the point where compute() is called.
Dask arrays are composed of many NumPy (or NumPy-like) arrays. How these arrays are arranged can significantly affect performance.
You can not provide a delayed shape, but you can state that the shape is unknown using np.nan
as a value wherever you don't know a dimension
import random
import numpy as np
import dask
import dask.array as da
@dask.delayed
def f():
return np.ones((5, random.randint(10, 20))) # a 5 x ? array
values = [f() for _ in range(5)]
arrays = [da.from_delayed(v, shape=(5, np.nan), dtype=float) for v in values]
x = da.concatenate(arrays, axis=1)
>>> x
dask.array<concatenate, shape=(5, nan), dtype=float64, chunksize=(5, nan)>
>>> x.shape
(5, np.nan)
>>> x.compute().shape
(5, 88)
See http://dask.pydata.org/en/latest/array-chunks.html#unknown-chunks
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With