When I sort one of my dataframes, e.g.:
my_df.sort(['column_A', 'column_B'])
I get:
ValueError: Cannot sort by duplicate column ['A', 'B']
The columns have different data and different names. Here is the full error:
/Users/josh/anaconda/envs/py27/lib/python2.7/site-packages/pandas/core/frame.pyc in sort(self, columns, column, axis, ascending, inplace)
2534 columns = column
2535 return self.sort_index(by=columns, axis=axis, ascending=ascending,
-> 2536 inplace=inplace)
2537
2538 def sort_index(self, axis=0, by=None, ascending=True, inplace=False,
/Users/josh/anaconda/envs/py27/lib/python2.7/site-packages/pandas/core/frame.pyc in sort_index(self, axis, by, ascending, inplace, kind)
2603 if k.ndim == 2:
2604 raise ValueError('Cannot sort by duplicate column % s'
-> 2605 % str(by))
2606 indexer = k.argsort(kind=kind)
2607 if isinstance(ascending, (tuple, list)):
ValueError: Cannot sort by duplicate column ['A', 'B']
Here is the dataframe:
> my_df.head()
db_pixel db_advertiser-campaign
0 Schnucks - Rockford GateHouse Media- Inc. Q1_2013--Katy's Pet Cemetary_1.13.14
1 Speedway Auto Mall GateHouse Media- Inc. Q1_2013--Katy's Pet Cemetary_1.13.14
2 Hagerstown Honda_Homepage_1.9.14 GateHouse Media- Inc. Q1_2013--Katy's Pet Cemetary_1.13.14
3 Mitchell Gold GateHouse Media- Inc. Q1_2013--Katy's Pet Cemetary_1.13.14
4 Gambino Realtors - PropelRETARGET GateHouse Media- Inc. Q1_2013--Katy's Pet Cemetary_1.13.14
[5 rows x 2 columns]
Note that I am also having the error with the following command:
> my_df.head().sort(['db_pixel', 'db_advertiser-campaingn'])
To drop duplicate columns from pandas DataFrame use df. T. drop_duplicates(). T , this removes all columns that have the same data regardless of column names.
To find duplicate columns we need to iterate through all columns of a DataFrame and for each and every column it will search if any other column exists in DataFrame with the same contents already. If yes then that column name will be stored in the duplicate column set.
Index objects are not required to be unique; you can have duplicate row or column labels.
I realized I was calling df.sort(columns=[my_columns])
instead of df.sort(columns=my_columns)
. In an effort to simplify the OP I didn't accurately write the exact call I was making. Sorry for the confusion
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With