Pandas dropping columns by index drops all columns with same name

Tags:

pandas

Consider following dataframe which has columns with same name (Apparently this does happens, currently I have a dataset like this! :( )

>>> df = pd.DataFrame({"a":range(10,15),"b":range(5,10)})
>>> df.rename(columns={"b":"a"},inplace=True)
df

    a   a
0   10  5
1   11  6
2   12  7
3   13  8
4   14  9

>>> df.columns
Index(['a', 'a'], dtype='object')

I would expect that when dropping by index , only the column with the respective index would be gone, but apparently this is not the case.

>>> df.drop(df.columns[-1],1)

0
1
2
3
4

Is there a way to get rid of columns with duplicated column names?

EDIT: I choose missleading values for the first column, fixed now

EDIT2: the expected outcome is

820

asked Mar 04 '16 14:03

redacted

1 Answers

Actually just do this:

In [183]:
df.ix[:,~df.columns.duplicated()]

Out[183]:
   a
0  0
1  1
2  2
3  3
4  4

So this index all rows and then uses the column mask generated from duplicated and invert the mask using ~

The output from duplicated:

In [184]:
df.columns.duplicated()

Out[184]:
array([False,  True], dtype=bool)

UPDATE

As .ix is deprecated (since v0.20.1) you should do any of the following:

df.iloc[:,~df.columns.duplicated()]

df.loc[:,~df.columns.duplicated()]

Thanks to @DavideFiocco for alerting me

113

answered Sep 23 '22 06:09

EdChum

Related questions
                            
                                pandas scatter plot colors with three points and seaborn
                            
                                Panda DataFrame Passing in Parameters For Plotting
                            
                                How to specify the number of rows a pandas dataframe will have?
                            
                                Python: converting Trip duration of h min sec and leave only minute count
                            
                                Reshaping pandas DataFrame from Meshgrid
                            
                                How to Pivot in Google BigQuery [duplicate]
                            
                                python pandas read_excel returns UnicodeDecodeError on describe()
                            
                                Using Pandas to Iteratively Add Columns to a Dataframe
                            
                                pandas 'as_index' function doesn't work as expected
                            
                                How to deal with modifying large pandas dataframes
                            
                                Replace Nulls in DataFrame with Max in Row
                            
                                df.loc filtering doesn't work with None values
                            
                                Pandas: Change values in multiple columns according to boolean condition
                            
                                Python Pandas: Merge or Filter DataFrame by Another. Is there a Better Way?
                            
                                Comparing two dataframes of different length row by row and adding columns for each row with equal value
                            
                                pandas: write df to text file - indent df to right by 5 white spaces
                            
                                Merge two rows in the same Dataframe if their index is the same?
                            
                                Data munging in pandas
                            
                                .head() and .tail() with negative indexes on pandas GroupBy object
                            
                                Pandas rolling_max with variable window size specified in a df column

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With