I am ranking a float variable in Pandas and I want to force ranks to be unique (no duplicate ranks in the event of ties.)
This is what happens:
vals = pd.Series([0.0133, 0.0018, np.nan, 0.0006, 0.0006])
vals.rank(ascending=False, method='dense')
0 1.0
1 2.0
2 NaN
3 3.0
4 3.0
I would like the result to instead be
0 1.0
1 2.0
2 NaN
3 3.0
4 4.0
Can I do this with the rank
method or do I have to do this manually with some sorting and looping logic?
You can use first
for the method (see the Series.rank docs):
first: ranks assigned in order they appear in the array
ser = pd.Series([1, 2, np.nan, 3, 3, 4])
ser.rank(method='first')
Out:
0 1.0
1 2.0
2 NaN
3 3.0
4 4.0
5 5.0
dtype: float64
To clarify ayhan's answer on this (since I don't have enough reputation to edit or comment!)
df.rank(method=first)
is only going to work if the DF is sorted the way you want.
So you need to first sort your dataframe using df.sort_values()
, then you can rank it with df.rank(method=first)
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With