pandas group by year, rank by sales column, in a dataframe with duplicate data

Tags:

I would like to create a rank on year (so in year 2012, Manager B is 1. In 2011, Manager B is 1 again). I struggled with the pandas rank function for awhile and DO NOT want to resort to a for loop.

s = pd.DataFrame([['2012','A',3],['2012','B',8],['2011','A',20],['2011','B',30]], columns=['Year','Manager','Return'])

Out[1]:     
   Year Manager  Return    
0  2012       A       3    
1  2012       B       8    
2  2011       A      20    
3  2011       B      30

The issue I'm having is with the additional code (didn't think this would be relevant before):

s = pd.DataFrame([['2012', 'A', 3], ['2012', 'B', 8], ['2011', 'A', 20], ['2011', 'B', 30]], columns=['Year', 'Manager', 'Return'])
b = pd.DataFrame([['2012', 'A', 3], ['2012', 'B', 8], ['2011', 'A', 20], ['2011', 'B', 30]], columns=['Year', 'Manager', 'Return'])

s = s.append(b)
s['Rank'] = s.groupby(['Year'])['Return'].rank(ascending=False)

raise Exception('Reindexing only valid with uniquely valued Index '
Exception: Reindexing only valid with uniquely valued Index objects

Any ideas?
This is the real data structure I am using. Been having trouble re-indexing..

459

asked Jul 11 '13 22:07

Ben

1 Answers

It sounds like you want to group by the Year, then rank the Returns in descending order.

import pandas as pd
s = pd.DataFrame([['2012', 'A', 3], ['2012', 'B', 8], ['2011', 'A', 20], ['2011', 'B', 30]],
                 columns=['Year', 'Manager', 'Return'])
s['Rank'] = s.groupby(['Year'])['Return'].rank(ascending=False)
print(s)

yields

   Year Manager  Return  Rank
0  2012       A       3     2
1  2012       B       8     1
2  2011       A      20     2
3  2011       B      30     1

To address the OP's revised question: The error message

ValueError: cannot reindex from a duplicate axis

occurs when trying to groupby/rank on a DataFrame with duplicate values in the index. You can avoid the problem by constructing s to have unique index values after appending:

s = pd.DataFrame([['2012', 'A', 3], ['2012', 'B', 8], ['2011', 'A', 20], ['2011', 'B', 30]], columns=['Year', 'Manager', 'Return'])
b = pd.DataFrame([['2012', 'A', 3], ['2012', 'B', 8], ['2011', 'A', 20], ['2011', 'B', 30]], columns=['Year', 'Manager', 'Return'])
s = s.append(b, ignore_index=True)

yields

   Year Manager  Return
0  2012       A       3
1  2012       B       8
2  2011       A      20
3  2011       B      30
4  2012       A       3
5  2012       B       8
6  2011       A      20
7  2011       B      30

If you've already appended new rows using

s = s.append(b)

then use reset_index to create a unique index:

s = s.reset_index(drop=True)

answered Sep 29 '22 13:09

unutbu

Related questions
                            
                                Python Auto Importing [duplicate]
                            
                                How check if a task is already in python Queue?
                            
                                Race-condition creating folder in Python
                            
                                Python multiprocessing process vs. standalone Python VM
                            
                                Is there a multithreaded map() function? [closed]
                            
                                Subsetting data in Python
                            
                                python 3: how to check if an object is a function? [duplicate]
                            
                                Can a python program be run on a computer without Python? What about C/C++?
                            
                                How to use pipe in IPython
                            
                                Jinja2 ignore UndefinedErrors for objects that aren't found
                            
                                How to monkey patch Django?
                            
                                django querysets + memcached: best practices
                            
                                slices to immutable strings by reference and not copy
                            
                                UUID field added after data already in database. Is there any way to populate the UUID field for existing data?
                            
                                Python Opencv SolvePnP yields wrong translation vector
                            
                                Why are uncompiled, repeatedly used regexes so much slower in Python 3?
                            
                                Find closest row of DataFrame to given time in Pandas
                            
                                web scraping google news with python
                            
                                How to disable cookie handling with the Python requests library?
                            
                                Using Python to Remove All Lines Matching Regex

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

pandas group by year, rank by sales column, in a dataframe with duplicate data

Tags:

python

pandas

duplicates

pandas-groupby

rank

Ben

People also ask

1 Answers

unutbu

Recent Activity

Donate For Us