I'm trying to evaluate dask by converting a method from thunder (using Spark), to the equivalent numpy version, but I'm not sure how to write this using dask/distributed.
In thunder, I can take a stack of images, convert it to a series, and correlate against some signal:
imgs = thunder.images.fromrandom((10, 900, 900))
series = imgs.toseries()
signal = series[5, 5, :]
correlated = series.correlate(signal)
The numpy version looks like this:
series = numpy.random.rand(900, 900, 10)
signal = series[5, 5, :]
reshaped = series.reshape(900 * 900, 10)
correlated = numpy.asarray(
map(lambda x: numpy.corrcoef(x, signal)[0, 1], reshaped))
)
final = correlated.reshape(900, 900)
I'm looking for some tips on how to convert this into something for distributed in particular.
Perhaps something like the following?
import dask.array as da
import numpy as np
imgs = da.random.random((10, 900, 900), chunks=(1, 900, 900))
reshaped = imgs.reshape((10, 900 * 900))
If you wanted to correlate your images against each other
result = da.corrcoef(reshaped)
result.compute()
Or against some other signal
signal = np.random.random(900 * 900)
result = reshaped.map_blocks(np.corrcoef, signal, dtype=signal.dtype)
result.compute()
However, I'm not very familiar with your application, so the response above may be flawed.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With