assuming I have a 2d numpy array indicating probabilities for m samples in n classes (probabilities sum to 1 for each sample).
Assuming each sample can only be in one category, I want to create a new array with the same shape as the original, but with only binary values indicating which class had the highest probability.
Example:
[[0.2, 0.3, 0.5], [0.7, 0.1, 0.1]]
should be converted to:
[[0, 0, 1], [1, 0, 0]]
It seems amax already does almost what I want, but instead of the indices I want an indicator matrix as descrived above.
Seems simple, but somehow I can't figure it out using standard numpy functions. I could use regular python loops of course, but it seems there should be a simpler way.
In case multiple classes have the same probability, I would prefer a solution which only selects one of the classes (I don't care which in this case).
Thanks!
You can use argmax() to get the index of your maximum value. Then you just have to compute this value to get the line and column indices.
max() With the help of Numpy matrix. max() method, we can get the maximum value from given matrix.
maximum() function is used to find the element-wise maximum of array elements. It compares two arrays and returns a new array containing the element-wise maxima.
There is argmin() and argmax() provided by numpy that returns the index of the min and max of a numpy array respectively. Note that these will only return the index of the first occurrence.
Here's one way:
In [112]: a
Out[112]:
array([[ 0.2, 0.3, 0.5],
[ 0.7, 0.1, 0.1]])
In [113]: a == a.max(axis=1, keepdims=True)
Out[113]:
array([[False, False, True],
[ True, False, False]], dtype=bool)
In [114]: (a == a.max(axis=1, keepdims=True)).astype(int)
Out[114]:
array([[0, 0, 1],
[1, 0, 0]])
(But this will give a True value for each occurrence of the maximum in a row. See Divakar's answer for a nice way to select just the first occurrence of the maximum.)
In case of ties (two or more elements being the highest one in a row), where you want to select only one, here's one approach to do so with np.argmax
and broadcasting
-
(A.argmax(1)[:,None] == np.arange(A.shape[1])).astype(int)
Sample run -
In [296]: A
Out[296]:
array([[ 0.2, 0.3, 0.5],
[ 0.5, 0.5, 0. ]])
In [297]: (A.argmax(1)[:,None] == np.arange(A.shape[1])).astype(int)
Out[297]:
array([[0, 0, 1],
[1, 0, 0]])
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With