Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Binary numbers instead of one hot vectors

While doing logistic regression, it is common practice to use one hot vectors as desired result. So, no of classes = no of nodes in output layer. We don't use index of word in vocabulary(or a class number in general) because that may falsely indicate closeness of two classes. But why can't we use binary numbers instead of one-hot vectors?

i.e if there are 4 classes, we can represent each class as 00,01,10,11 resulting in log(no of classes) nodes in output layer.

like image 900
Raghuram Vadapalli Avatar asked Oct 23 '16 20:10

Raghuram Vadapalli


People also ask

Can one-hot encoding be used for binary?

One-hot encoding can be applied to the integer representation. This is where the integer encoded variable is removed and a new binary variable is added for each unique integer value.

What is difference between one-hot and binary encoding?

With binary encoding, as was used in the traffic light controller example, each state is represented as a binary number. Because K binary numbers can be represented by log2K bits, a system with K states needs only log2K bits of state. In one-hot encoding, a separate bit of state is used for each state.

What are the problems with one-hot vector representation of words?

It is easy to implement and can work really fast, but in this process, it loses the inner meaning of the word in a sentence. Thus it loses the context of the sentence. Because of this one hot encoding is not widely used in many natural language processing applications.


1 Answers

It is fine if you encode with binary. But you probably need to add another layer (or a filter) depending on your task and model. Because your encoding now implicates invalid shared features due to the binary representation.

For example, a binary encoding for input (x = [x1, x2]):

'apple' = [0, 0]
'orange' = [0, 1]
'table' = [1, 0]
'chair' = [1, 1]

It means that orange and chair share same feature x2. Now with predictions for two classes y:

'fruit' = 0
'furniture' = 1

And linear optimization model (W = [w1, w2] and bias b) for labeled data sample:

(argmin W) Loss = y - (w1 * x1 + w2 * x2 + b)

Whenever you update w2 weights for chair as furniture you get an undesirable update as if predicting orange as furniture as well.

In this particular case, if you add another layer U = [u1, u2], you can probably solve this issue:

(argmin U,W) Loss = y - (u1 * (w1 * x1 + w2 * x2 + b) +
                         u2 * (w1 * x1 + w2 * x2 + b) +
                         b2)

Ok, why not avoid this miss representation, by using one-hot encoding. :)

like image 132
Mehdi Avatar answered Nov 24 '22 05:11

Mehdi