Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to remove duplicate columns from a dataframe using python pandas

Tags:

python

pandas

By grouping two columns I made some changes.

I generated a file using python, it resulted in 2 duplicate columns. How to remove duplicate columns from a dataframe?

like image 286
Neer Avatar asked Jun 05 '13 11:06

Neer


1 Answers

It's probably easiest to use a groupby (assuming they have duplicate names too):

In [11]: df
Out[11]:
   A  B  B
0  a  4  4
1  b  4  4
2  c  4  4

In [12]: df.T.groupby(level=0).first().T
Out[12]:
   A  B
0  a  4
1  b  4
2  c  4

If they have different names you can drop_duplicates on the transpose:

In [21]: df
Out[21]:
   A  B  C
0  a  4  4
1  b  4  4
2  c  4  4

In [22]: df.T.drop_duplicates().T
Out[22]:
   A  B
0  a  4
1  b  4
2  c  4

Usually read_csv will usually ensure they have different names...

like image 157
Andy Hayden Avatar answered Oct 23 '22 04:10

Andy Hayden