Avoiding Dummy variable trap and neural network

Question

I know that categorical data should be one-hot encoded before training the machine learning algorithm. I also need that for multivariate linear regression I need to exclude one of the encoded variable to avoid so called dummy variable trap.

Ex: If I have categorical feature "size": "small", "medium", "large", then in one hot encoded I would have something like:

small  medium  large other-feature
  0      1       0      2999

So to avoid dummy variable trap I need to remove any of the 3 columns, for example, column "small".

Should I do the same for training a Neural Network? Or this is purely for multivariate regression?

Thanks.

Robert Peetsalu · Accepted Answer

As stated here, dummy variable trap needs to be avoided (one category of each categorical feature removed after encoding but before training) on input of algorithms that consider all the predictors together, as a linear combination. Such algorithms are:

Linear/multilinear regression
Logistic regression
Discriminant analysis
Neural networks that don't employ weight decay

If you remove a category from input of a neural network that employs weight decay, it will get biased in favor of the omitted category instead.

Even though no information is lost when omitting one category after encoding a feature, other algorithms will have to infer the correlation of the omitted category indirectly through combination of all the other categories, making them do more computation for the same result.

Avoiding Dummy variable trap and neural network

Tags:

neural-network

one-hot-encoding

regression

user3489820

1 Answers

Robert Peetsalu

Recent Activity

Donate For Us

Avoiding Dummy variable trap and neural network

Tags:

neural-network

one-hot-encoding

regression

user3489820

1 Answers

Robert Peetsalu

Related questions

Recent Activity

Donate For Us