Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to rank distinctly for each row in pandas.DataFrame

What I have

a user-user similarity matrix that some rows have duplicated value and NaN

userId  316       320       359       370       910
userId                                             
316     1.0  0.500000  0.500000  0.500000       NaN
320     0.5  1.000000  0.242837  0.019035  0.031737
359     0.5  0.242837  1.000000  0.357620  0.175914
370     0.5  0.019035  0.357620  1.000000  0.317371
910     NaN  0.031737  0.175914  0.317371  1.000000

What I want

I want rank the simirity for each row distinctly. Like so:

userId  316  320  359  370  910
userId                         
316       1    2    3    4   NaN
320       2    1    3    5    1
359       2    4    1    3    5
370       2    5    3    1    4
910      NaN   4    3    2    1

The rank between the same value is not important. But it needs to be a distinct value. And NaNmust be keeped.

What I tired

I tried df.rank(ascending =False,axis = 1) (doc), which failed to give me a distinct value of rank.
I also tried scipy.stats.rankdata (doc), but it can't keep NaN.

like image 222
Dawei Avatar asked Jan 03 '23 04:01

Dawei


1 Answers

Use rank with method='first'

df.rank(1, ascending=False, method='first')

     316  320  359  370  910
316  1.0  2.0  3.0  4.0  NaN
320  2.0  1.0  3.0  5.0  4.0
359  2.0  4.0  1.0  3.0  5.0
370  2.0  5.0  3.0  1.0  4.0
910  NaN  4.0  3.0  2.0  1.0
like image 184
Scott Boston Avatar answered Jan 10 '23 20:01

Scott Boston