When I sort one of my dataframes, e.g.: <pre class="prettyprint"><code>my_df.sort(['column_A', 'column_B']) </code></pre> I get: <pre class="prettyprint"><code>ValueError: Cannot sort by duplicate column ['A', 'B'] </code></pre> The columns have different data and different names. Here is the full error: <pre class="prettyprint"><code>/Users/josh/anaconda/envs/py27/lib/python2.7/site-packages/pandas/core/frame.pyc in sort(self, columns, column, axis, ascending, inplace) 2534 columns = column 2535 return self.sort_index(by=columns, axis=axis, ascending=ascending, -> 2536 inplace=inplace) 2537 2538 def sort_index(self, axis=0, by=None, ascending=True, inplace=False, /Users/josh/anaconda/envs/py27/lib/python2.7/site-packages/pandas/core/frame.pyc in sort_index(self, axis, by, ascending, inplace, kind) 2603 if k.ndim == 2: 2604 raise ValueError('Cannot sort by duplicate column % s' -> 2605 % str(by)) 2606 indexer = k.argsort(kind=kind) 2607 if isinstance(ascending, (tuple, list)): ValueError: Cannot sort by duplicate column ['A', 'B'] </code></pre> <h3>Update:</h3> Here is the dataframe: <pre class="prettyprint"><code>> my_df.head() db_pixel db_advertiser-campaign 0 Schnucks - Rockford GateHouse Media- Inc. Q1_2013--Katy's Pet Cemetary_1.13.14 1 Speedway Auto Mall GateHouse Media- Inc. Q1_2013--Katy's Pet Cemetary_1.13.14 2 Hagerstown Honda_Homepage_1.9.14 GateHouse Media- Inc. Q1_2013--Katy's Pet Cemetary_1.13.14 3 Mitchell Gold GateHouse Media- Inc. Q1_2013--Katy's Pet Cemetary_1.13.14 4 Gambino Realtors - PropelRETARGET GateHouse Media- Inc. Q1_2013--Katy's Pet Cemetary_1.13.14 [5 rows x 2 columns] </code></pre> Note that I am also having the error with the following command: <pre class="prettyprint"><code>> my_df.head().sort(['db_pixel', 'db_advertiser-campaingn']) </code></pre>

I realized I was calling <code>df.sort(columns=[my_columns])</code> instead of <code>df.sort(columns=my_columns)</code>. In an effort to simplify the OP I didn't accurately write the exact call I was making. Sorry for the confusion

Pandas - Cannot sort by duplicate column

Tags:

python

pandas

When I sort one of my dataframes, e.g.:

my_df.sort(['column_A', 'column_B'])

I get:

ValueError: Cannot sort by duplicate column ['A', 'B']

The columns have different data and different names. Here is the full error:

/Users/josh/anaconda/envs/py27/lib/python2.7/site-packages/pandas/core/frame.pyc in sort(self, columns, column, axis, ascending, inplace)                
   2534             columns = column
   2535         return self.sort_index(by=columns, axis=axis, ascending=ascending,                                                                           
-> 2536                                inplace=inplace)
   2537 
   2538     def sort_index(self, axis=0, by=None, ascending=True, inplace=False,    


/Users/josh/anaconda/envs/py27/lib/python2.7/site-packages/pandas/core/frame.pyc in sort_index(self, axis, by, ascending, inplace, kind)                 
   2603                 if k.ndim == 2:
   2604                     raise ValueError('Cannot sort by duplicate column % s'                                                                            
-> 2605                                      % str(by))
   2606                 indexer = k.argsort(kind=kind)
   2607                 if isinstance(ascending, (tuple, list)):

ValueError: Cannot sort by duplicate column ['A', 'B']

Update:

Here is the dataframe:

> my_df.head()
                            db_pixel                                      db_advertiser-campaign
0                Schnucks - Rockford  GateHouse Media- Inc. Q1_2013--Katy's Pet Cemetary_1.13.14
1                 Speedway Auto Mall  GateHouse Media- Inc. Q1_2013--Katy's Pet Cemetary_1.13.14
2   Hagerstown Honda_Homepage_1.9.14  GateHouse Media- Inc. Q1_2013--Katy's Pet Cemetary_1.13.14
3                      Mitchell Gold  GateHouse Media- Inc. Q1_2013--Katy's Pet Cemetary_1.13.14
4  Gambino Realtors - PropelRETARGET  GateHouse Media- Inc. Q1_2013--Katy's Pet Cemetary_1.13.14

[5 rows x 2 columns]

Note that I am also having the error with the following command:

> my_df.head().sort(['db_pixel', 'db_advertiser-campaingn'])

593

asked Feb 26 '14 21:02

Josh

1 Answers

I realized I was calling df.sort(columns=[my_columns]) instead of df.sort(columns=my_columns). In an effort to simplify the OP I didn't accurately write the exact call I was making. Sorry for the confusion

167

answered Sep 28 '22 06:09

Josh

Related questions
                            
                                Getting groups from LDAP to django
                            
                                Fast Numerical Integration in Python
                            
                                PyCharm & IronPython Codecompletion?
                            
                                pyside qtreewidget constrain drag and drop
                            
                                ATOMIC_REQUEST and Transactions in Django 1.6
                            
                                Local and heroku db get out of sync while migrating using alembic
                            
                                VirtualEnv/Pip trying to install packages globally
                            
                                SystemExit: 2 error when calling parse_args() in iPython Notebook
                            
                                How to set environment variables in travis-ci and access them from python script?
                            
                                Is there a python version of node-webkit
                            
                                Connect points with same value in python matplotlib
                            
                                Need to read specific range of text file in Python
                            
                                dtype mismatch in sklearn on k-means
                            
                                Pandas: Unstacking One Column of a DataFrame
                            
                                Parsing XML with XPath in Python 3
                            
                                Performance discrepancy: obj.__setitem__(x,y) vs. obj[x] = y?
                            
                                Are unittest base classes good practice? (python/webapp2)
                            
                                Is Django post_save triggered before/after saving instance to database?
                            
                                plot year over year on 12 month axis
                            
                                matplotlib text only in plot area

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Pandas - Cannot sort by duplicate column

Tags:

python

pandas

Update:

Josh

People also ask

1 Answers

Josh

Recent Activity

Donate For Us