I am trying to rank a pandas data frame based on two columns. I can rank it based on one column, but how can to rank it based on two columns? 'SaleCount', then 'TotalRevenue'?
import pandas as pd
df = pd.DataFrame({'TotalRevenue':[300,9000,1000,750,500,2000,0,600,50,500],
'Date':['2016-12-02' for i in range(10)],
'SaleCount':[10,100,30,35,20,100,0,30,2,20],
'shops':['S3','S2','S1','S5','S4','S8','S6','S7','S9','S10']})
df['Rank'] = df.SaleCount.rank(method='dense',ascending = False).astype(int)
#df['Rank'] = df.TotalRevenue.rank(method='dense',ascending = False).astype(int)
df.sort_values(['Rank'], inplace=True)
print(df)
current output:
Date SaleCount TotalRevenue shops Rank
1 2016-12-02 100 9000 S2 1
5 2016-12-06 100 2000 S8 1
3 2016-12-04 35 750 S5 2
2 2016-12-03 30 1000 S1 3
7 2016-12-08 30 600 S7 3
9 2016-12-10 20 500 S10 4
4 2016-12-05 20 500 S4 4
0 2016-12-01 10 300 S3 5
8 2016-12-09 2 50 S9 6
6 2016-12-07 0 0 S6 7
I'm trying to generate an output like this:
Date SaleCount TotalRevenue shops Rank
1 2016-12-02 100 9000 S2 1
5 2016-12-02 100 2000 S8 2
3 2016-12-02 35 750 S5 3
2 2016-12-02 30 1000 S1 4
7 2016-12-02 30 600 S7 5
9 2016-12-02 20 500 S10 6
4 2016-12-02 20 500 S4 6
0 2016-12-02 10 300 S3 7
8 2016-12-02 2 50 S9 8
6 2016-12-02 0 0 S6 9
Select a blank cell which you will place the ranking result, for instance, D2, and type this formula =RANK(B2,$B$2:$B$7)+SUMPRODUCT(--($B$2:$B$7=$B2),--(C2<$C$2:$C$7)), press Enter key, and then drag fill handle over the cells which use this formula.
Pandas DataFrame: rank() function The rank() function is used to compute numerical data ranks (1 through n) along axis. By default, equal values are assigned a rank that is the average of the ranks of those values. Index to direct ranking. How to rank the group of records that have the same value (i.e. ties):
For assigning the rank to our dataframe's elements, we use a built-in function of the pandas library that is the . rank() function. We pass the criteria based on which we are ranking the elements to it, and this function returns a new column in each row where the ranking is stored.
The generic way to do that is to group the desired fiels in a tuple, whatever the types.
df["Rank"] = df[["SaleCount","TotalRevenue"]].apply(tuple,axis=1)\
.rank(method='dense',ascending=False).astype(int)
df.sort_values("Rank")
TotalRevenue Date SaleCount shops Rank
1 9000 2016-12-02 100 S2 1
5 2000 2016-12-02 100 S8 2
3 750 2016-12-02 35 S5 3
2 1000 2016-12-02 30 S1 4
7 600 2016-12-02 30 S7 5
4 500 2016-12-02 20 S4 6
9 500 2016-12-02 20 S10 6
0 300 2016-12-02 10 S3 7
8 50 2016-12-02 2 S9 8
6 0 2016-12-02 0 S6 9
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With