Rank DataFrame based on multiple columns

Question

How can I rank a DataFrame based on 2 columns?

On below example, col_b would the tie breaker for col_a.

DataFrame:

df = pd.DataFrame({'col_a':[0,0,0,1,1,1], 'col_b':[5,2,8,3,7,4]})

df
   col_a  col_b
0      0      5
1      0      2
2      0      8
3      1      3
4      1      7
5      1      4

Expected Output:

   col_a  col_b  Rank
0      0      5   2
1      0      2   1
2      0      8   3
3      1      3   4
4      1      7   6
5      1      4   5

Mazdak · Accepted Answer

Here is a one-line approach using sort_values:

In [135]: df['rank'] = df.sort_values(['col_a', 'col_b'])['col_b'].index + 1

In [136]: df
Out[136]: 
   col_a  col_b  rank
0      0      5     2
1      0      2     1
2      0      8     3
3      1      3     4
4      1      7     6
5      1      4     5

The logic behind this snippet: Basically, the DataFrame.sort_values function accepts multiple column names and returns a sorted copy of the dataframe based on the order of passed column names. The default sorting order is ascending which is what we want. If you wanted another order you could pass the order as an iterable of booleans to the ascending keyword argument. At the end the new indices of the column_b is what we want (plus one).

Rank DataFrame based on multiple columns

Tags:

python

python-3.x

pandas

dataframe

ranking

Hugo

1 Answers

Mazdak

Recent Activity

Donate For Us

Rank DataFrame based on multiple columns

Tags:

python

python-3.x

pandas

dataframe

ranking

Hugo

1 Answers

Mazdak

Related questions

Recent Activity

Donate For Us