Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

dask dataframe head() returns empty df

Tags:

python

dask

I have a dask dataframe with an index on one of the columns. The issue is if I do a df.head() it always treturns an empty df, whereas df.tail always returns the correct df. I checked df.head always checks for the first n entries in the first partition. So if i do df.reset_index(), it should work but thats not the case

Below is the code to reproduce this:

import dask.dataframe as dd
import pandas as pd

data = pd.DataFrame({
     'i64': np.arange(1000, dtype=np.int64),
     'Ii32': np.arange(1000, dtype=np.int32),
     'bhello': np.random.choice(['hello', 'Yo', 'people'], size=1000).astype("O")
})

daskDf = dd.from_pandas(data, chunksize=3)
daskDf = daskDf.set_index('bhello')
print(daskDf.head())
like image 370
pranav kohli Avatar asked May 25 '18 07:05

pranav kohli


People also ask

What does empty DataFrame mean?

DataFrame - empty property The empty property indicates whether DataFrame is empty or not. True if DataFrame is entirely empty (no items), meaning any of the axes are of length 0. Syntax: DataFrame.empty. Returns: bool, If DataFrame is empty, return True, if not return False.

How do I read a Dask DataFrame?

You can inspect the content of the Dask DataFrame with the compute() method. This is quite similar to the syntax for reading CSV files into pandas DataFrames. The Dask DataFrame API was intentionally designed to look and feel just like the pandas API.

Is Dask lazy evaluation?

Lazy Evaluation. Most Dask Collections, including Dask DataFrame are evaluated lazily, which means Dask constructs the logic (called task graph) of your computation immediately but “evaluates” them only when necessary.

How do you know if a DF LOC is empty?

You can use the attribute df. empty to check whether it's empty or not: if df. empty: print('DataFrame is empty!


1 Answers

Try calling head with npartitions=-1, to use all partitions (by default, only the first is used, and there may not be enough elements to return the head).

daskDf.head(npartitions=-1)
like image 57
cs95 Avatar answered Sep 22 '22 23:09

cs95