I want to make a matrix x
with shape (n_samples, n_classes)
where each x[i]
is a random one-hot vector. Here's a slow implementation:
x = np.zeros((n_samples, n_classes))
J = np.random.choice(n_classes, n_samples)
for i, j in enumerate(J):
x[i, j] = 1
What's a more pythonic way to do this?
Specifically, the LabelEncoder of creating an integer encoding of labels and the OneHotEncoder for creating a one hot encoding of integer encoded values.
NumPy random choice helps you create random samples. One common task in data analysis, statistics, and related fields is taking random samples of data. You'll see random samples in probability, Bayesian statistics, machine learning, and other subjects. Random samples are very common in data-related fields.
Create an identity matrix using np.eye
:
x = np.eye(n_classes)
Then use np.random.choice
to select rows at random:
x[np.random.choice(x.shape[0], size=n_samples)]
As a shorthand, just use:
np.eye(n_classes)[np.random.choice(n_classes, n_samples)]
Demo:
In [90]: np.eye(5)[np.random.choice(5, 100)]
Out[90]:
array([[ 1., 0., 0., 0., 0.],
[ 1., 0., 0., 0., 0.],
[ 0., 0., 1., 0., 0.],
[ 0., 0., 0., 0., 1.],
[ 0., 0., 0., 1., 0.],
[ 1., 0., 0., 0., 0.],
[ 0., 0., 0., 1., 0.],
.... (... to 100)
For the assignment part, you can use advanced indexing:
# initialize data
n_samples = 3
n_classes = 5
x = np.zeros((n_samples, n_classes))
J = np.random.choice(n_classes, n_samples)
# assign with advanced indexing
x[np.arange(n_samples), J] = 1
x
#array([[ 0., 1., 0., 0., 0.],
# [ 0., 1., 0., 0., 0.],
# [ 1., 0., 0., 0., 0.]])
Or another option, use OneHotEncoder
from sklearn
:
n_samples = 3
n_classes = 5
J = np.random.choice(n_classes, n_samples)
from sklearn.preprocessing import OneHotEncoder
enc = OneHotEncoder(n_values=n_classes, sparse=False)
enc.fit_transform(J.reshape(-1,1))
#array([[ 1., 0., 0., 0., 0.],
# [ 0., 0., 0., 0., 1.],
# [ 0., 1., 0., 0., 0.]])
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With