Since .persist() caches data in the background, I'm wondering whether it is possible to wait until it finishes caching then do the following things. In addition, there is a way to have a progress bar for the caching process? Thank you very much
Yes, the functions you're looking for are aptly named wait and progress.
from dask.distributed import wait, progress
The progress function takes any dask thing and renders a progress bar
>>> progress(x)
[XXXXXXX................] 5.2 seconds
If you are in the IPython notebook, then progress is also non-blocking and uses IPython widgets. If you are in the IPython console or a straight Python executable, then progress is blocking and will not return until the computation completes.
If you do not want a progress bar, or if you are in the Jupyter notebook, then you may want to separately use the wait function, which will block until the computations finish.
wait(x)
http://distributed.readthedocs.io/en/latest/api.html#distributed.client.wait http://distributed.readthedocs.io/en/latest/api.html#distributed.diagnostics.progress
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With