Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to see progress of Dask compute task?

Tags:

I would like to see a progress bar on Jupyter notebook while I'm running a compute task using Dask, I'm counting all values of id column from a large csv file +4GB, so any ideas?

import dask.dataframe as dd  df = dd.read_csv('data/train.csv') df.id.count().compute() 
like image 989
ambigus9 Avatar asked Feb 28 '18 22:02

ambigus9


People also ask

What is compute in Dask?

Dask DataFrames are composed of a collection of underlying pandas DataFrames (partitions). compute() concatenates all the Dask DataFrame partitions into a single pandas DataFrame.

How does Dask delayed work?

The Dask delayed function decorates your functions so that they operate lazily. Rather than executing your function immediately, it will defer execution, placing the function and its arguments into a task graph. Wraps a function or object to produce a Delayed .

What does Dask persist do?

You can store DataFrames in memory with Dask persist which will make downstream queries that depend on the persisted data faster. This is great when you perform some expensive computations and want to save the results in memory so they're not rerun multiple times.


1 Answers

If you're using the single machine scheduler then do this:

from dask.diagnostics import ProgressBar ProgressBar().register() 

http://dask.pydata.org/en/latest/diagnostics-local.html

If you're using the distributed scheduler then do this:

from dask.distributed import progress  result = df.id.count.persist() progress(result) 

Or just use the dashboard

http://dask.pydata.org/en/latest/diagnostics-distributed.html

like image 125
MRocklin Avatar answered Sep 17 '22 22:09

MRocklin