dask dataframe head() returns empty df

Tags:

I have a dask dataframe with an index on one of the columns. The issue is if I do a df.head() it always treturns an empty df, whereas df.tail always returns the correct df. I checked df.head always checks for the first n entries in the first partition. So if i do df.reset_index(), it should work but thats not the case

Below is the code to reproduce this:

Click to copy

import dask.dataframe as dd
import pandas as pd

data = pd.DataFrame({
     'i64': np.arange(1000, dtype=np.int64),
     'Ii32': np.arange(1000, dtype=np.int32),
     'bhello': np.random.choice(['hello', 'Yo', 'people'], size=1000).astype("O")
})

Click to copy

daskDf = dd.from_pandas(data, chunksize=3)
daskDf = daskDf.set_index('bhello')
print(daskDf.head())

370

asked May 25 '18 07:05

pranav kohli

1 Answers

Try calling head with npartitions=-1, to use all partitions (by default, only the first is used, and there may not be enough elements to return the head).

Click to copy

daskDf.head(npartitions=-1)

answered Sep 22 '22 23:09

cs95

Related questions
                            
                                Is there a t test table in python (numpy, scipy etc)?
                            
                                Pandas - insert a dataframe to MongoDB
                            
                                Error with pip install git (after switching to python 3.6)
                            
                                Tensorflow Dataset .map() API
                            
                                Pulling random files out of a folder for sampling
                            
                                AttributeError: 'GMM' object has no attribute 'covariances_' || AttributeError: 'module' object has no attribute 'GaussianMixture'
                            
                                Rawpy: How to postprocess raw images WITHOUT adulterating pixel data?
                            
                                regex in django 2.0 re_path
                            
                                CUDNN_STATUS_NOT_INITIALIZED when trying to run TensorFlow
                            
                                substring multiple characters from the last index of a pyspark string column using negative indexing
                            
                                converting exponent or scientific number into integer in pandas python
                            
                                Alternative of send_file() in flask on Pythonanywhere?
                            
                                pandas: selecting rows in a specific time window
                            
                                Tensorflow flatten vs numpy flatten function effect on machine learning training
                            
                                Find coordinates of a Canny Edge Image - OpenCV & python
                            
                                pandas multiple date ranges from column of dates
                            
                                IPython magic print variables on assignment
                            
                                Spyder IDE complaining about unable to detect undefined names
                            
                                Finding two most far away points in plot with many points in Python
                            
                                How start start celery worker in Django project

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

dask dataframe head() returns empty df

Tags:

python

dask

pranav kohli

People also ask

1 Answers

cs95

Recent Activity

Donate For Us