I have a dask dataframe with an index on one of the columns. The issue is if I do a df.head() it always treturns an empty df, whereas df.tail always returns the correct df. I checked df.head always checks for the first n entries in the first partition. So if i do df.reset_index(), it should work but thats not the case
Below is the code to reproduce this:
import dask.dataframe as dd
import pandas as pd
data = pd.DataFrame({
'i64': np.arange(1000, dtype=np.int64),
'Ii32': np.arange(1000, dtype=np.int32),
'bhello': np.random.choice(['hello', 'Yo', 'people'], size=1000).astype("O")
})
daskDf = dd.from_pandas(data, chunksize=3)
daskDf = daskDf.set_index('bhello')
print(daskDf.head())
DataFrame - empty property The empty property indicates whether DataFrame is empty or not. True if DataFrame is entirely empty (no items), meaning any of the axes are of length 0. Syntax: DataFrame.empty. Returns: bool, If DataFrame is empty, return True, if not return False.
You can inspect the content of the Dask DataFrame with the compute() method. This is quite similar to the syntax for reading CSV files into pandas DataFrames. The Dask DataFrame API was intentionally designed to look and feel just like the pandas API.
Lazy Evaluation. Most Dask Collections, including Dask DataFrame are evaluated lazily, which means Dask constructs the logic (called task graph) of your computation immediately but “evaluates” them only when necessary.
You can use the attribute df. empty to check whether it's empty or not: if df. empty: print('DataFrame is empty!
Try calling head
with npartitions=-1
, to use all partitions (by default, only the first is used, and there may not be enough elements to return the head
).
daskDf.head(npartitions=-1)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With