Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Ranking order per group in Pandas

Tags:

python

pandas

Consider a dataframe with three columns: group_ID, item_ID and value. Say we have 10 itemIDs total.

I need to rank each item_ID (1 to 10) within each group_ID based on value, and then see the mean rank (and other stats) across groups (e.g. the IDs with the highest value across groups would get ranks closer to 1). How can I do this in Pandas?

This answer does something very close with qcut, but not exactly the same.


A data example would look like:

      group_ID   item_ID  value 0   0S00A1HZEy        AB     10 1   0S00A1HZEy        AY      4 2   0S00A1HZEy        AC     35 3   0S03jpFRaC        AY     90 4   0S03jpFRaC        A5      3 5   0S03jpFRaC        A3     10 6   0S03jpFRaC        A2      8 7   0S03jpFRaC        A4      9 8   0S03jpFRaC        A6      2 9   0S03jpFRaC        AX      0 

which would result in:

      group_ID   item_ID   rank 0   0S00A1HZEy        AB      2 1   0S00A1HZEy        AY      3 2   0S00A1HZEy        AC      1 3   0S03jpFRaC        AY      1 4   0S03jpFRaC        A5      5 5   0S03jpFRaC        A3      2 6   0S03jpFRaC        A2      4 7   0S03jpFRaC        A4      3 8   0S03jpFRaC        A6      6 9   0S03jpFRaC        AX      7 
like image 815
Amelio Vazquez-Reina Avatar asked Nov 24 '15 16:11

Amelio Vazquez-Reina


People also ask

What does rank () do in pandas?

Pandas DataFrame: rank() function The rank() function is used to compute numerical data ranks (1 through n) along axis. By default, equal values are assigned a rank that is the average of the ranks of those values. Index to direct ranking.

Does group by order matter pandas?

Show activity on this post. No, the order doesn't matter for the GROUP BY clause.

How do you get top 5 values in pandas?

Python's Pandas module provide easy ways to do aggregation and calculate metrics. Finding Top 5 maximum value for each group can also be achieved while doing the group by. The function that is helpful for finding the Top 5 maximum value is nlargest().


1 Answers

There are lots of different arguments you can pass to rank; it looks like you can use rank("dense", ascending=False) to get the results you want, after doing a groupby:

>>> df["rank"] = df.groupby("group_ID")["value"].rank("dense", ascending=False) >>> df      group_ID item_ID  value  rank 0  0S00A1HZEy      AB     10     2 1  0S00A1HZEy      AY      4     3 2  0S00A1HZEy      AC     35     1 3  0S03jpFRaS      AY     90     1 4  0S03jpFRaS      A5      3     5 5  0S03jpFRaS      A3     10     2 6  0S03jpFRaS      A2      8     4 7  0S03jpFRaS      A4      9     3 8  0S03jpFRaS      A6      2     6 9  0S03jpFRaS      AX      0     7 

But note that if you're not using a global ranking scheme, finding out the mean rank across groups isn't very meaningful-- unless there are duplicate values in a group (and so you have duplicate rank values) all you're doing is measuring how many elements there are in a group.

like image 53
DSM Avatar answered Sep 23 '22 03:09

DSM