Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How does this one-hot vector conversion work?

When I was working on my machine learning project, I was looking for a line of code to turn my labels into one-hot vectors. I came across this nifty line of code from u/benanne on Reddit.

np.eye(n_labels)[target_vector]

For example, for a target_vector = np.array([1, 4, 2, 1, 0, 1, 3, 2]), it returns the one-hot coded values:

np.eye(5)[target_vector]
Out: 
array([[ 0.,  1.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  1.],
       [ 0.,  0.,  1.,  0.,  0.],
       ..., 
       [ 0.,  1.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  1.,  0.],
       [ 0.,  0.,  1.,  0.,  0.]])

While it definitely does work, I'm not sure how or why it works.

like image 279
Suha Hussain Avatar asked Jul 12 '17 23:07

Suha Hussain


People also ask

How does OneHotEncoder work?

One-hot encoding is the process by which categorical data are converted into numerical data for use in machine learning. Categorical features are turned into binary features that are “one-hot” encoded, meaning that if a feature is represented by that column, it receives a 1 . Otherwise, it receives a 0 .

What are examples of a one-hot encoded vector?

For example, [0,0,0,1,0] would be a valid one-hot encoding that would tell you the classification in position 4 (or 3 in array indexing) is the classification of the object. Contrastingly, [0,1,0,1,0],and [1,1,1,1,1] are examples of invalid one-hot encodings.

Why do you use one-hot encoding?

We should prefer using the One Hot Encoding method when : The categorical features present in the data is not ordinal (like the countries above) When the number of categorical features present in the dataset is less so that the one-hot encoding technique can be effectively applied while building the model.


1 Answers

It's rather simple. np.eye(n_labels) creates an identity matrix of size n_labels then you use your target_vector to select rows, corresponding to the value of the current target, from that matrix. Since each row in an identity matrix contains exactly one 1 element and the rest 0, each row will be a valid 'one hot coding'.

like image 82
JohanL Avatar answered Oct 14 '22 15:10

JohanL