Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Rank within columns of 2d array

>>> a = array([[10, 50, 20, 30, 40],
...            [50, 30, 40, 20, 10],
...            [30, 20, 20, 10, 50]])

>>> some_np_expression(a)
array([[1, 3, 1, 3, 2],
       [3, 2, 3, 2, 1],
       [2, 1, 2, 1, 3]])

What is some_np_expression? Don't care about how ties are settled so long as the ranks are distinct and sequential.

like image 826
MikeRand Avatar asked Jan 02 '15 02:01

MikeRand


2 Answers

Double argsort is a standard (but inefficient!) way to do this:

In [120]: a
Out[120]: 
array([[10, 50, 20, 30, 40],
       [50, 30, 40, 20, 10],
       [30, 20, 20, 10, 50]])

In [121]: a.argsort(axis=0).argsort(axis=0) + 1
Out[121]: 
array([[1, 3, 1, 3, 2],
       [3, 2, 3, 2, 1],
       [2, 1, 2, 1, 3]])

With some more code, you can avoid sorting twice. Note that I'm using a different a in the following:

In [262]: a
Out[262]: 
array([[30, 30, 10, 10],
       [10, 20, 20, 30],
       [20, 10, 30, 20]])

Call argsort once:

In [263]: s = a.argsort(axis=0)

Use s to construct the array of rankings:

In [264]: i = np.arange(a.shape[0]).reshape(-1, 1)

In [265]: j = np.arange(a.shape[1])

In [266]: ranked = np.empty_like(a, dtype=int)

In [267]: ranked[s, j] = i + 1

In [268]: ranked
Out[268]: 
array([[3, 3, 1, 1],
       [1, 2, 2, 3],
       [2, 1, 3, 2]])

Here's the less efficient (but more concise) version:

In [269]: a.argsort(axis=0).argsort(axis=0) + 1
Out[269]: 
array([[3, 3, 1, 1],
       [1, 2, 2, 3],
       [2, 1, 3, 2]])
like image 56
Warren Weckesser Avatar answered Oct 04 '22 05:10

Warren Weckesser


Now Scipy offers a function to rank data with an axis argument - you can set along what axis you want to rank the data.

from scipy.stats.mstats import rankdata    
a = array([[10, 50, 20, 30, 40],
           [50, 30, 40, 20, 10],
           [30, 20, 20, 10, 50]])

ranked_vertical = rankdata(a, axis=0) 
like image 33
Primoz Avatar answered Oct 04 '22 04:10

Primoz