Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas dropping columns by index drops all columns with same name

Tags:

pandas

Consider following dataframe which has columns with same name (Apparently this does happens, currently I have a dataset like this! :( )

>>> df = pd.DataFrame({"a":range(10,15),"b":range(5,10)})
>>> df.rename(columns={"b":"a"},inplace=True)
df

    a   a
0   10  5
1   11  6
2   12  7
3   13  8
4   14  9

>>> df.columns
Index(['a', 'a'], dtype='object')

I would expect that when dropping by index , only the column with the respective index would be gone, but apparently this is not the case.

>>> df.drop(df.columns[-1],1)

0
1
2
3
4

Is there a way to get rid of columns with duplicated column names?

EDIT: I choose missleading values for the first column, fixed now

EDIT2: the expected outcome is

  a
0 10
1 11
2 12 
3 13
4 14
like image 820
redacted Avatar asked Mar 04 '16 14:03

redacted


People also ask

How do I drop columns with the same name?

drop_duplicates(). T you can drop/remove/delete duplicate columns with the same name or a different name. This method removes all columns of the same name beside the first occurrence of the column also removes columns that have the same data with the different column name.

Can you drop columns by index in pandas?

You can drop columns by index by using DataFrame. drop() method and by using DataFrame. iloc[].


1 Answers

Actually just do this:

In [183]:
df.ix[:,~df.columns.duplicated()]

Out[183]:
   a
0  0
1  1
2  2
3  3
4  4

So this index all rows and then uses the column mask generated from duplicated and invert the mask using ~

The output from duplicated:

In [184]:
df.columns.duplicated()

Out[184]:
array([False,  True], dtype=bool)

UPDATE

As .ix is deprecated (since v0.20.1) you should do any of the following:

df.iloc[:,~df.columns.duplicated()]

or

df.loc[:,~df.columns.duplicated()]

Thanks to @DavideFiocco for alerting me

like image 113
EdChum Avatar answered Sep 23 '22 06:09

EdChum