Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas - Cannot sort by duplicate column

Tags:

python

pandas

When I sort one of my dataframes, e.g.:

my_df.sort(['column_A', 'column_B'])

I get:

ValueError: Cannot sort by duplicate column ['A', 'B']

The columns have different data and different names. Here is the full error:

/Users/josh/anaconda/envs/py27/lib/python2.7/site-packages/pandas/core/frame.pyc in sort(self, columns, column, axis, ascending, inplace)                
   2534             columns = column
   2535         return self.sort_index(by=columns, axis=axis, ascending=ascending,                                                                           
-> 2536                                inplace=inplace)
   2537 
   2538     def sort_index(self, axis=0, by=None, ascending=True, inplace=False,    


/Users/josh/anaconda/envs/py27/lib/python2.7/site-packages/pandas/core/frame.pyc in sort_index(self, axis, by, ascending, inplace, kind)                 
   2603                 if k.ndim == 2:
   2604                     raise ValueError('Cannot sort by duplicate column % s'                                                                            
-> 2605                                      % str(by))
   2606                 indexer = k.argsort(kind=kind)
   2607                 if isinstance(ascending, (tuple, list)):

ValueError: Cannot sort by duplicate column ['A', 'B']

Update:

Here is the dataframe:

> my_df.head()
                            db_pixel                                      db_advertiser-campaign
0                Schnucks - Rockford  GateHouse Media- Inc. Q1_2013--Katy's Pet Cemetary_1.13.14
1                 Speedway Auto Mall  GateHouse Media- Inc. Q1_2013--Katy's Pet Cemetary_1.13.14
2   Hagerstown Honda_Homepage_1.9.14  GateHouse Media- Inc. Q1_2013--Katy's Pet Cemetary_1.13.14
3                      Mitchell Gold  GateHouse Media- Inc. Q1_2013--Katy's Pet Cemetary_1.13.14
4  Gambino Realtors - PropelRETARGET  GateHouse Media- Inc. Q1_2013--Katy's Pet Cemetary_1.13.14

[5 rows x 2 columns]

Note that I am also having the error with the following command:

> my_df.head().sort(['db_pixel', 'db_advertiser-campaingn'])                                                               
like image 593
Josh Avatar asked Feb 26 '14 21:02

Josh


People also ask

How do I get rid of duplicate columns in pandas?

To drop duplicate columns from pandas DataFrame use df. T. drop_duplicates(). T , this removes all columns that have the same data regardless of column names.

How can check duplicate column in pandas?

To find duplicate columns we need to iterate through all columns of a DataFrame and for each and every column it will search if any other column exists in DataFrame with the same contents already. If yes then that column name will be stored in the duplicate column set.

Does pandas allow duplicate column names?

Index objects are not required to be unique; you can have duplicate row or column labels.


1 Answers

I realized I was calling df.sort(columns=[my_columns]) instead of df.sort(columns=my_columns). In an effort to simplify the OP I didn't accurately write the exact call I was making. Sorry for the confusion

like image 167
Josh Avatar answered Sep 28 '22 06:09

Josh