Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Asynchronous Xarray writing to Zarr

all. I'm using a Dask Distributed cluster to write Zarr+Dask-backed Xarray Datasets inside of a loop, and the dataset.to_zarr is blocking. This can really slow things down when there are straggler chunks that block the continuation of the loop. Is there a way to do the .to_zarr asynchronously, so that the loop can continue with the next dataset write without being held up by a few straggler chunks?

like image 246
jkmacc Avatar asked Jun 09 '26 05:06

jkmacc


1 Answers

With the distributed scheduler, you get async behaviour without any special effort. For example, if you are doing arr.to_zarr, then indeed you are going to wait for completion. However, you could do the following:

client = Client(...)
out = arr.to_zarr(..., compute=False)
fut = client.compute(out)

This will return a future, fut, whose status reflects the current state of the whole computation, and you can choose whether to wait on it or to continue submitting new work. You could also display it to a progress bar (in the notebook) which will update asynchronously whenever the kernel is not busy.

like image 106
mdurant Avatar answered Jun 11 '26 20:06

mdurant



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!