When I was working on my machine learning project, I was looking for a line of code to turn my labels into one-hot vectors. I came across this nifty line of code from u/benanne on Reddit.
np.eye(n_labels)[target_vector]
For example, for a target_vector = np.array([1, 4, 2, 1, 0, 1, 3, 2])
, it returns the one-hot coded values:
np.eye(5)[target_vector]
Out:
array([[ 0., 1., 0., 0., 0.],
[ 0., 0., 0., 0., 1.],
[ 0., 0., 1., 0., 0.],
...,
[ 0., 1., 0., 0., 0.],
[ 0., 0., 0., 1., 0.],
[ 0., 0., 1., 0., 0.]])
While it definitely does work, I'm not sure how or why it works.
One-hot encoding is the process by which categorical data are converted into numerical data for use in machine learning. Categorical features are turned into binary features that are “one-hot” encoded, meaning that if a feature is represented by that column, it receives a 1 . Otherwise, it receives a 0 .
For example, [0,0,0,1,0] would be a valid one-hot encoding that would tell you the classification in position 4 (or 3 in array indexing) is the classification of the object. Contrastingly, [0,1,0,1,0],and [1,1,1,1,1] are examples of invalid one-hot encodings.
We should prefer using the One Hot Encoding method when : The categorical features present in the data is not ordinal (like the countries above) When the number of categorical features present in the dataset is less so that the one-hot encoding technique can be effectively applied while building the model.
It's rather simple. np.eye(n_labels)
creates an identity matrix of size n_labels
then you use your target_vector
to select rows, corresponding to the value of the current target, from that matrix. Since each row in an identity matrix contains exactly one 1
element and the rest 0
, each row will be a valid 'one hot coding'.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With