Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Random one-hot matrix in numpy

I want to make a matrix x with shape (n_samples, n_classes) where each x[i] is a random one-hot vector. Here's a slow implementation:

x = np.zeros((n_samples, n_classes))
J = np.random.choice(n_classes, n_samples)
for i, j in enumerate(J):
    x[i, j] = 1

What's a more pythonic way to do this?

like image 940
michaelsnowden Avatar asked Jul 14 '17 02:07

michaelsnowden


People also ask

What function do we use to create one hot encoded arrays of the labels?

Specifically, the LabelEncoder of creating an integer encoding of labels and the OneHotEncoder for creating a one hot encoding of integer encoded values.

What is random choice in NumPy?

NumPy random choice helps you create random samples. One common task in data analysis, statistics, and related fields is taking random samples of data. You'll see random samples in probability, Bayesian statistics, machine learning, and other subjects. Random samples are very common in data-related fields.


2 Answers

Create an identity matrix using np.eye:

x = np.eye(n_classes)

Then use np.random.choice to select rows at random:

x[np.random.choice(x.shape[0], size=n_samples)]

As a shorthand, just use:

np.eye(n_classes)[np.random.choice(n_classes, n_samples)]

Demo:

In [90]: np.eye(5)[np.random.choice(5, 100)]
Out[90]: 
array([[ 1.,  0.,  0.,  0.,  0.],
       [ 1.,  0.,  0.,  0.,  0.],
       [ 0.,  0.,  1.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  1.],
       [ 0.,  0.,  0.,  1.,  0.],
       [ 1.,  0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  1.,  0.],
       .... (... to 100)
like image 181
cs95 Avatar answered Oct 18 '22 00:10

cs95


For the assignment part, you can use advanced indexing:

# initialize data
n_samples = 3
n_classes = 5
x = np.zeros((n_samples, n_classes))
J = np.random.choice(n_classes, n_samples)

# assign with advanced indexing
x[np.arange(n_samples), J] = 1

x
#array([[ 0.,  1.,  0.,  0.,  0.],
#       [ 0.,  1.,  0.,  0.,  0.],
#       [ 1.,  0.,  0.,  0.,  0.]])

Or another option, use OneHotEncoder from sklearn:

n_samples = 3
n_classes = 5
J = np.random.choice(n_classes, n_samples)

from sklearn.preprocessing import OneHotEncoder
enc = OneHotEncoder(n_values=n_classes, sparse=False)
enc.fit_transform(J.reshape(-1,1))

#array([[ 1.,  0.,  0.,  0.,  0.],
#       [ 0.,  0.,  0.,  0.,  1.],
#       [ 0.,  1.,  0.,  0.,  0.]])
like image 21
Psidom Avatar answered Oct 17 '22 22:10

Psidom