Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Can't drop columns or slice dataframe using dask?

Tags:

dask

I am trying to use dask instead of pandas since I have 2.6gb csv file. I load it and I want to drop a column. but it seems that neither the drop method df.drop('column') or slicing df[ : , :-1]

is implemented yet. Is this the case or am I just missing something ?

like image 477
chrisfs Avatar asked Aug 07 '15 00:08

chrisfs


1 Answers

We implemented the drop method in this PR. This is available as of dask 0.7.0.

In [1]: import pandas as pd

In [2]: df = pd.DataFrame({'x': [1, 2, 3], 'y': [3, 2, 1]})

In [3]: import dask.dataframe as dd

In [4]: ddf = dd.from_pandas(df, npartitions=2)

In [5]: ddf.drop('y', axis=1).compute()
Out[5]: 
   x
0  1
1  2
2  3

Previously one could also have used slicing with column names; though of course this can be less attractive if you have many columns.

In [6]: ddf[['x']].compute()
Out[6]: 
   x
0  1
1  2
2  3
like image 81
MRocklin Avatar answered Nov 30 '22 06:11

MRocklin