selecting few rows by index from dask dataframe?

Question

df = dd.read_csv('csv',usecols=fields,skip_blank_lines=True)
len(df.iloc[0:5])

The above code raises

AttributeError: 'DataFrame' object has no attribute 'iloc'

tried ix loc but unable select rows based on index

MRocklin · Accepted Answer

Dask.dataframe does not support iloc. Generally it's quite hard to do access any particular row in a csv file without first reading it all into memory.

However if you only want a few of the rows at the top then I recommend using the .head() method

>>> df.head()

scottlittle · Answer

One workaround is to create the index as a column, i.e. df_index, in your csv file and use it like so:

selection = (df[ df['df_index'].isin( list_of_indexes ) ]).compute()

selecting few rows by index from dask dataframe?

Tags:

dask

madnavs

2 Answers

MRocklin

scottlittle

Recent Activity

Donate For Us

selecting few rows by index from dask dataframe?

Tags:

dask

madnavs

2 Answers

MRocklin

scottlittle

Related questions

Recent Activity

Donate For Us