Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Force incrementation in pandas rank method

I am ranking a float variable in Pandas and I want to force ranks to be unique (no duplicate ranks in the event of ties.)

This is what happens:

vals = pd.Series([0.0133, 0.0018, np.nan, 0.0006, 0.0006])
vals.rank(ascending=False, method='dense')

0    1.0
1    2.0
2    NaN
3    3.0
4    3.0

I would like the result to instead be

0    1.0
1    2.0
2    NaN
3    3.0
4    4.0

Can I do this with the rank method or do I have to do this manually with some sorting and looping logic?

like image 408
Chris Avatar asked Nov 20 '16 20:11

Chris


2 Answers

You can use first for the method (see the Series.rank docs):

first: ranks assigned in order they appear in the array

ser = pd.Series([1, 2, np.nan, 3, 3, 4])

ser.rank(method='first')
Out: 
0    1.0
1    2.0
2    NaN
3    3.0
4    4.0
5    5.0
dtype: float64
like image 153
ayhan Avatar answered Oct 14 '22 09:10

ayhan


To clarify ayhan's answer on this (since I don't have enough reputation to edit or comment!)

df.rank(method=first) is only going to work if the DF is sorted the way you want.

So you need to first sort your dataframe using df.sort_values(), then you can rank it with df.rank(method=first).

like image 31
Silas Avatar answered Oct 14 '22 09:10

Silas