Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python Pandas - Deleting multiple series from a data frame in one command

Tags:

python

pandas

In short ... I have a Python Pandas data frame that is read in from an Excel file using 'read_table'. I would like to keep a handful of the series from the data, and purge the rest. I know that I can just delete what I don't want one-by-one using 'del data['SeriesName']', but what I'd rather do is specify what to keep instead of specifying what to delete.

If the simplest answer is to copy the existing data frame into a new data frame that only contains the series I want, and then delete the existing frame in its entirety, I would satisfied with that solution ... but if that is indeed the best way, can someone walk me through it?

TIA ... I'm a newb to Pandas. :)

like image 365
Grant M. Avatar asked Jan 16 '13 16:01

Grant M.


2 Answers

You can use the DataFrame drop function to remove columns. You have to pass the axis=1 option for it to work on columns and not rows. Note that it returns a copy so you have to assign the result to a new DataFrame:

In [1]: from pandas import *

In [2]: df = DataFrame(dict(x=[0,0,1,0,1], y=[1,0,1,1,0], z=[0,0,1,0,1]))

In [3]: df
Out[3]:
   x  y  z
0  0  1  0
1  0  0  0
2  1  1  1
3  0  1  0
4  1  0  1

In [4]: df = df.drop(['x','y'], axis=1)

In [5]: df
Out[5]:
   z
0  0
1  0
2  1
3  0
4  1
like image 121
Zelazny7 Avatar answered Sep 23 '22 00:09

Zelazny7


Basically the same as Zelazny7's answer -- just specifying what to keep:

In [68]: df
Out[68]: 
   x  y  z
0  0  1  0
1  0  0  0
2  1  1  1
3  0  1  0
4  1  0  1

In [70]: df = df[['x','z']]                                                                

In [71]: df
Out[71]: 
   x  z
0  0  0
1  0  0
2  1  1
3  0  0
4  1  1

*Edit*

You can specify a large number of columns through indexing/slicing into the Dataframe.columns object.
This object of type(pandas.Index) can be viewed as a dict of column labels (with some extended functionality).

See this extension of above examples:

In [4]: df.columns
Out[4]: Index([x, y, z], dtype=object)

In [5]: df[df.columns[1:]]
Out[5]: 
   y  z
0  1  0
1  0  0
2  1  1
3  1  0
4  0  1

In [7]: df.drop(df.columns[1:], axis=1)
Out[7]: 
   x
0  0
1  0
2  1
3  0
4  1
like image 20
tzelleke Avatar answered Sep 23 '22 00:09

tzelleke