I want to make a matrix <code>x</code> with shape <code>(n_samples, n_classes)</code> where each <code>x[i]</code> is a random one-hot vector. Here's a slow implementation: <pre class="prettyprint"><code>x = np.zeros((n_samples, n_classes)) J = np.random.choice(n_classes, n_samples) for i, j in enumerate(J): x[i, j] = 1 </code></pre> What's a more pythonic way to do this?

Create an identity matrix using <code>np.eye</code>: <pre class="prettyprint"><code>x = np.eye(n_classes) </code></pre> Then use <code>np.random.choice</code> to select rows at random: <pre class="prettyprint"><code>x[np.random.choice(x.shape[0], size=n_samples)] </code></pre> As a shorthand, just use: <pre class="prettyprint"><code>np.eye(n_classes)[np.random.choice(n_classes, n_samples)] </code></pre> Demo: <pre class="prettyprint"><code>In [90]: np.eye(5)[np.random.choice(5, 100)] Out[90]: array([[ 1., 0., 0., 0., 0.], [ 1., 0., 0., 0., 0.], [ 0., 0., 1., 0., 0.], [ 0., 0., 0., 0., 1.], [ 0., 0., 0., 1., 0.], [ 1., 0., 0., 0., 0.], [ 0., 0., 0., 1., 0.], .... (... to 100) </code></pre>

For the assignment part, you can use advanced indexing: <pre class="prettyprint"><code># initialize data n_samples = 3 n_classes = 5 x = np.zeros((n_samples, n_classes)) J = np.random.choice(n_classes, n_samples) # assign with advanced indexing x[np.arange(n_samples), J] = 1 x #array([[ 0., 1., 0., 0., 0.], # [ 0., 1., 0., 0., 0.], # [ 1., 0., 0., 0., 0.]]) </code></pre> <hr> Or another option, use <code>OneHotEncoder</code> from <code>sklearn</code>: <pre class="prettyprint"><code>n_samples = 3 n_classes = 5 J = np.random.choice(n_classes, n_samples) from sklearn.preprocessing import OneHotEncoder enc = OneHotEncoder(n_values=n_classes, sparse=False) enc.fit_transform(J.reshape(-1,1)) #array([[ 1., 0., 0., 0., 0.], # [ 0., 0., 0., 0., 1.], # [ 0., 1., 0., 0., 0.]]) </code></pre>

Random one-hot matrix in numpy

x = np.zeros((n_samples, n_classes))
J = np.random.choice(n_classes, n_samples)
for i, j in enumerate(J):
    x[i, j] = 1

What's a more pythonic way to do this?

940

asked Jul 14 '17 02:07

michaelsnowden

2 Answers

Create an identity matrix using np.eye:

x = np.eye(n_classes)

Then use np.random.choice to select rows at random:

x[np.random.choice(x.shape[0], size=n_samples)]

As a shorthand, just use:

np.eye(n_classes)[np.random.choice(n_classes, n_samples)]

Demo:

In [90]: np.eye(5)[np.random.choice(5, 100)]
Out[90]: 
array([[ 1.,  0.,  0.,  0.,  0.],
       [ 1.,  0.,  0.,  0.,  0.],
       [ 0.,  0.,  1.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  1.],
       [ 0.,  0.,  0.,  1.,  0.],
       [ 1.,  0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  1.,  0.],
       .... (... to 100)

181

answered Oct 18 '22 00:10

cs95

For the assignment part, you can use advanced indexing:

# initialize data
n_samples = 3
n_classes = 5
x = np.zeros((n_samples, n_classes))
J = np.random.choice(n_classes, n_samples)

# assign with advanced indexing
x[np.arange(n_samples), J] = 1

x
#array([[ 0.,  1.,  0.,  0.,  0.],
#       [ 0.,  1.,  0.,  0.,  0.],
#       [ 1.,  0.,  0.,  0.,  0.]])

Or another option, use OneHotEncoder from sklearn:

n_samples = 3
n_classes = 5
J = np.random.choice(n_classes, n_samples)

from sklearn.preprocessing import OneHotEncoder
enc = OneHotEncoder(n_values=n_classes, sparse=False)
enc.fit_transform(J.reshape(-1,1))

#array([[ 1.,  0.,  0.,  0.,  0.],
#       [ 0.,  0.,  0.,  0.,  1.],
#       [ 0.,  1.,  0.,  0.,  0.]])

answered Oct 17 '22 22:10

Psidom

Related questions
                            
                                Can't invoke celery task in Django tests synchronously
                            
                                Make this directory sync script detect change and run in the background
                            
                                Reccurence algorithm: find position after n moves
                            
                                Python escape "{}" symbol [duplicate]
                            
                                Adding seconds to datetime [duplicate]
                            
                                Remove an element from a Python list of lists in PySpark DataFrame
                            
                                Restart program tkinter
                            
                                Can't see the migrate list in Django
                            
                                pandas get mapping of categories to integer value
                            
                                Combinations of two lists (not element-wise) [duplicate]
                            
                                How convert a JSON string to Dictionary in Python?
                            
                                Why python datetime replace timezone is returning different timezone?
                            
                                Django adding placeholders to django built in login forms
                            
                                TensorFlow - Ignore infinite values when calculating the mean of a tensor
                            
                                Python:Fill in missing datetime values in dataframe and fill forward?
                            
                                SqlAlchemy: How to make a LONGBLOB column in mysql?
                            
                                How do I fix the TypeError raised when trying to find an element using Selenium?
                            
                                Using lists in Pandas to replace column names
                            
                                Error Installing pymssql
                            
                                Simple Feedforward Neural Network with TensorFlow won't learn

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Random one-hot matrix in numpy

Tags:

python

arrays

numpy

michaelsnowden

People also ask

2 Answers

cs95

Psidom

Recent Activity

Donate For Us