I have a 2D matrix with values, and I want to find the top 5 values' indices. For example for
matrix([[0.17542851, 0.13199346, 0.01579704, 0.01429822, 0.01302919],
[0.13279703, 0.12444886, 0.04742024, 0.03114371, 0.02623729],
[0.13502306, 0.07815065, 0.07291175, 0.03690815, 0.02163695],
[0.19032505, 0.15853737, 0.05889324, 0.02791679, 0.02699252],
[0.1695696 , 0.14538635, 0.07127667, 0.04997876, 0.02580234]])
I want to get (0,3), (0,1), (0,4), (3,1), (4,1)
I searched and tried many workaround, including np.argmax(), np.argsort(), np.argpartition() without any good results.
For example:
>>np.dstack(np.unravel_index(np.argsort(a.ravel(),axis=None), a.shape))
array([[[0, 4],
[0, 3],
[0, 2],
[2, 4],
[4, 4],
[1, 4],
[3, 4],
[3, 3],
[1, 3],
[2, 3],
[1, 2],
[4, 3],
[3, 2],
[4, 2],
[2, 2],
[2, 1],
[1, 1],
[0, 1],
[1, 0],
[2, 0],
[4, 1],
[3, 1],
[4, 0],
[0, 0],
[3, 0]]], dtype=int64)
this result makes no sense. Notice that I want the original indices, I don't care about the order (just want the top 5 in any order, ascending will be better though)
np.argpartition should be a good tool (efficient one) to get those top k indices without maintaining order. Hence, for array data a, it would be -
In [43]: np.c_[np.unravel_index(np.argpartition(a.ravel(),-5)[-5:],a.shape)]
Out[43]:
array([[4, 1],
[3, 1],
[4, 0],
[0, 0],
[3, 0]])
To explain, let's break it down into single process steps -
# Get partitioned indices such that the last 5 indices refer to the top 5
# values taken globally from the input array. Refer to docs for more info
# Note that it's global because we will flatten it.
In [9]: np.argpartition(a.ravel(),-5)
Out[9]:
array([14, 24, 2, 3, 4, 23, 22, 7, 8, 9, 19, 18, 17, 13, 12, 11, 6,
1, 5, 10, 21, 16, 20, 0, 15])
# Get last 5 indices, which are the top 5 valued indices
In [10]: np.argpartition(a.ravel(),-5)[-5:]
Out[10]: array([21, 16, 20, 0, 15])
# Convert the global indices back to row-col format
In [11]: np.unravel_index(np.argpartition(a.ravel(),-5)[-5:],a.shape)
Out[11]: (array([4, 3, 4, 0, 3]), array([1, 1, 0, 0, 0]))
# Stack into two-columnar array
In [12]: np.c_[np.unravel_index(np.argpartition(a.ravel(),-5)[-5:],a.shape)]
Out[12]:
array([[4, 1],
[3, 1],
[4, 0],
[0, 0],
[3, 0]])
For matrix data in a, it would be -
In [48]: np.dstack(np.unravel_index(np.argpartition(a.ravel(),-5)[:,-5:],a.shape))
Out[48]:
array([[[4, 1],
[3, 1],
[4, 0],
[0, 0],
[3, 0]]])
So, compared to the array, the only difference is with the usage of np.dstack, because with matrix data, the data always stays as 2D.
Notice that these are your results from the last 5 rows.
Your sample:
n = np.array([[0.17542851, 0.13199346, 0.01579704, 0.01429822, 0.01302919],
[0.13279703, 0.12444886, 0.04742024, 0.03114371, 0.02623729],
[0.13502306, 0.07815065, 0.07291175, 0.03690815, 0.02163695],
[0.19032505, 0.15853737, 0.05889324, 0.02791679, 0.02699252],
[0.1695696 , 0.14538635, 0.07127667, 0.04997876, 0.02580234]])
Your output is not top 5 values' indice. Top 5 values are
array([0.14538635, 0.15853737, 0.1695696 , 0.17542851, 0.19032505])
To get their indices: sort and using isin to flag their location True. Finally, use argwhere to get their posistion
np.argwhere(np.isin(n, np.sort(n, axis=None)[-5:]))
Out[324]:
array([[0, 0],
[3, 0],
[3, 1],
[4, 0],
[4, 1]], dtype=int32)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With