Consider a dataframe with three columns: group_ID
, item_ID
and value
. Say we have 10 itemIDs
total.
I need to rank each item_ID
(1 to 10) within each group_ID
based on value
, and then see the mean rank (and other stats) across groups (e.g. the IDs with the highest value across groups would get ranks closer to 1). How can I do this in Pandas?
This answer does something very close with qcut
, but not exactly the same.
A data example would look like:
group_ID item_ID value 0 0S00A1HZEy AB 10 1 0S00A1HZEy AY 4 2 0S00A1HZEy AC 35 3 0S03jpFRaC AY 90 4 0S03jpFRaC A5 3 5 0S03jpFRaC A3 10 6 0S03jpFRaC A2 8 7 0S03jpFRaC A4 9 8 0S03jpFRaC A6 2 9 0S03jpFRaC AX 0
which would result in:
group_ID item_ID rank 0 0S00A1HZEy AB 2 1 0S00A1HZEy AY 3 2 0S00A1HZEy AC 1 3 0S03jpFRaC AY 1 4 0S03jpFRaC A5 5 5 0S03jpFRaC A3 2 6 0S03jpFRaC A2 4 7 0S03jpFRaC A4 3 8 0S03jpFRaC A6 6 9 0S03jpFRaC AX 7
Pandas DataFrame: rank() function The rank() function is used to compute numerical data ranks (1 through n) along axis. By default, equal values are assigned a rank that is the average of the ranks of those values. Index to direct ranking.
Show activity on this post. No, the order doesn't matter for the GROUP BY clause.
Python's Pandas module provide easy ways to do aggregation and calculate metrics. Finding Top 5 maximum value for each group can also be achieved while doing the group by. The function that is helpful for finding the Top 5 maximum value is nlargest().
There are lots of different arguments you can pass to rank
; it looks like you can use rank("dense", ascending=False)
to get the results you want, after doing a groupby
:
>>> df["rank"] = df.groupby("group_ID")["value"].rank("dense", ascending=False) >>> df group_ID item_ID value rank 0 0S00A1HZEy AB 10 2 1 0S00A1HZEy AY 4 3 2 0S00A1HZEy AC 35 1 3 0S03jpFRaS AY 90 1 4 0S03jpFRaS A5 3 5 5 0S03jpFRaS A3 10 2 6 0S03jpFRaS A2 8 4 7 0S03jpFRaS A4 9 3 8 0S03jpFRaS A6 2 6 9 0S03jpFRaS AX 0 7
But note that if you're not using a global ranking scheme, finding out the mean rank across groups isn't very meaningful-- unless there are duplicate values in a group (and so you have duplicate rank values) all you're doing is measuring how many elements there are in a group.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With