assuming I have a 2d numpy array indicating probabilities for m samples in n classes (probabilities sum to 1 for each sample). Assuming each sample can only be in one category, I want to create a new array with the same shape as the original, but with only binary values indicating which class had the highest probability. Example: <pre class="prettyprint"><code>[[0.2, 0.3, 0.5], [0.7, 0.1, 0.1]] </code></pre> should be converted to: <pre class="prettyprint"><code>[[0, 0, 1], [1, 0, 0]] </code></pre> It seems amax already does almost what I want, but instead of the indices I want an indicator matrix as descrived above. Seems simple, but somehow I can't figure it out using standard numpy functions. I could use regular python loops of course, but it seems there should be a simpler way. In case multiple classes have the same probability, I would prefer a solution which only selects one of the classes (I don't care which in this case). Thanks!

In case of ties (two or more elements being the highest one in a row), where you want to select only one, here's one approach to do so with <code>np.argmax</code> and <code>broadcasting</code> - <pre class="prettyprint"><code>(A.argmax(1)[:,None] == np.arange(A.shape[1])).astype(int) </code></pre> Sample run - <pre class="prettyprint"><code>In [296]: A Out[296]: array([[ 0.2, 0.3, 0.5], [ 0.5, 0.5, 0. ]]) In [297]: (A.argmax(1)[:,None] == np.arange(A.shape[1])).astype(int) Out[297]: array([[0, 0, 1], [1, 0, 0]]) </code></pre>

How to convert 2d numpy array into binary indicator matrix for max value

Tags:

python

machine-learning

numpy

python-2.7

assuming I have a 2d numpy array indicating probabilities for m samples in n classes (probabilities sum to 1 for each sample).

Assuming each sample can only be in one category, I want to create a new array with the same shape as the original, but with only binary values indicating which class had the highest probability.

Example:

[[0.2, 0.3, 0.5], [0.7, 0.1, 0.1]]

should be converted to:

[[0, 0, 1], [1, 0, 0]]

It seems amax already does almost what I want, but instead of the indices I want an indicator matrix as descrived above.

Seems simple, but somehow I can't figure it out using standard numpy functions. I could use regular python loops of course, but it seems there should be a simpler way.

In case multiple classes have the same probability, I would prefer a solution which only selects one of the classes (I don't care which in this case).

Thanks!

307

asked Mar 22 '16 11:03

aKzenT

2 Answers

Here's one way:

In [112]: a
Out[112]: 
array([[ 0.2,  0.3,  0.5],
       [ 0.7,  0.1,  0.1]])

In [113]: a == a.max(axis=1, keepdims=True)
Out[113]: 
array([[False, False,  True],
       [ True, False, False]], dtype=bool)

In [114]: (a == a.max(axis=1, keepdims=True)).astype(int)
Out[114]: 
array([[0, 0, 1],
       [1, 0, 0]])

(But this will give a True value for each occurrence of the maximum in a row. See Divakar's answer for a nice way to select just the first occurrence of the maximum.)

175

answered Oct 19 '22 23:10

Warren Weckesser

In case of ties (two or more elements being the highest one in a row), where you want to select only one, here's one approach to do so with np.argmax and broadcasting -

(A.argmax(1)[:,None] == np.arange(A.shape[1])).astype(int)

Sample run -

In [296]: A
Out[296]: 
array([[ 0.2,  0.3,  0.5],
       [ 0.5,  0.5,  0. ]])

In [297]: (A.argmax(1)[:,None] == np.arange(A.shape[1])).astype(int)
Out[297]: 
array([[0, 0, 1],
       [1, 0, 0]])

answered Oct 20 '22 00:10

Divakar

Related questions
                            
                                OneHotEncoder with string categorical values
                            
                                How to find the sum of a string in a list
                            
                                Creating a program that prints true if three words are entered in dictionary order
                            
                                OperationTimedOut: errors={}, last_host=127.0.0.1
                            
                                How to add Gaussian noise to an image?
                            
                                AppEngine - Remote API returning 401 and too-many-auth
                            
                                sklearn's PLSRegression: "ValueError: array must not contain infs or NaNs"
                            
                                Why i can't do some things without sudo using Python and pip?
                            
                                Python: How to reset the turtle graphics window
                            
                                Is the python "elif" compiled differently from else: if?
                            
                                python np.c_ error"CClass object is not callabel"
                            
                                Pyro4: Failed to locate the nameserver
                            
                                Python argparse --toggle --no-toggle flag
                            
                                Python: No module named ... How to use pip
                            
                                In Python, is it possible to access the global namespace from within a function
                            
                                Import pandas on jupyter ipython notebook fails
                            
                                ImportError: No module named numpy.distutils.core (Ubuntu xgboost installation)
                            
                                QComboBox click event
                            
                                Add a white background to colorbar in matplotlib
                            
                                how to make a new numpy array same size as a given array and fill it with a scalar value

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With