I've curious about rank in Python. How did the output arrive as printed below? What did rank do the data?
Input obj = pd.Series([7,-5,7,4,2,0,4])
Output:
print(obj)
0 7
1 -5
2 7
3 4
4 2
5 0
6 4
Rank
print(obj.rank())
0 6.5
1 1.0
2 6.5
3 4.5
4 3.0
5 2.0
6 4.5
According to the official Pandas Documentation, it does the following:
Compute numerical data ranks (1 through n) along axis. Equal values are assigned a rank that is the average of the ranks of those values
This means that essencially all values get assigned a "highscore". The Value 7 is the highest and therefore gets the highest ranking but since the value 7 is there twice, it both gets the highscore 7 AND 6. But since the value 7 can not have 2 different "highscores", it gets assigned the average of both rankings. (6+7)/2 is 6.5 -> that is the ranking of the data Value 7. other values are more straight forward, for example -5 is the lowest and therefore gets the worst rank.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With