Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to convert 2d numpy array into binary indicator matrix for max value

assuming I have a 2d numpy array indicating probabilities for m samples in n classes (probabilities sum to 1 for each sample).

Assuming each sample can only be in one category, I want to create a new array with the same shape as the original, but with only binary values indicating which class had the highest probability.

Example:

[[0.2, 0.3, 0.5], [0.7, 0.1, 0.1]]

should be converted to:

[[0, 0, 1], [1, 0, 0]]

It seems amax already does almost what I want, but instead of the indices I want an indicator matrix as descrived above.

Seems simple, but somehow I can't figure it out using standard numpy functions. I could use regular python loops of course, but it seems there should be a simpler way.

In case multiple classes have the same probability, I would prefer a solution which only selects one of the classes (I don't care which in this case).

Thanks!

like image 307
aKzenT Avatar asked Mar 22 '16 11:03

aKzenT


People also ask

How do you find the maximum value of a 2D NumPy array?

You can use argmax() to get the index of your maximum value. Then you just have to compute this value to get the line and column indices.

How do you find the maximum value of a matrix in python?

max() With the help of Numpy matrix. max() method, we can get the maximum value from given matrix.

Which function helps find the maximum value NumPy?

maximum() function is used to find the element-wise maximum of array elements. It compares two arrays and returns a new array containing the element-wise maxima.

How do you find the index of the maximum value in an array NumPy?

There is argmin() and argmax() provided by numpy that returns the index of the min and max of a numpy array respectively. Note that these will only return the index of the first occurrence.


2 Answers

Here's one way:

In [112]: a
Out[112]: 
array([[ 0.2,  0.3,  0.5],
       [ 0.7,  0.1,  0.1]])

In [113]: a == a.max(axis=1, keepdims=True)
Out[113]: 
array([[False, False,  True],
       [ True, False, False]], dtype=bool)

In [114]: (a == a.max(axis=1, keepdims=True)).astype(int)
Out[114]: 
array([[0, 0, 1],
       [1, 0, 0]])

(But this will give a True value for each occurrence of the maximum in a row. See Divakar's answer for a nice way to select just the first occurrence of the maximum.)

like image 175
Warren Weckesser Avatar answered Oct 19 '22 23:10

Warren Weckesser


In case of ties (two or more elements being the highest one in a row), where you want to select only one, here's one approach to do so with np.argmax and broadcasting -

(A.argmax(1)[:,None] == np.arange(A.shape[1])).astype(int)

Sample run -

In [296]: A
Out[296]: 
array([[ 0.2,  0.3,  0.5],
       [ 0.5,  0.5,  0. ]])

In [297]: (A.argmax(1)[:,None] == np.arange(A.shape[1])).astype(int)
Out[297]: 
array([[0, 0, 1],
       [1, 0, 0]])
like image 42
Divakar Avatar answered Oct 20 '22 00:10

Divakar