Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Neural Network Categorical Data Implementation

I've been learning to work with neural networks as a hobby project, but am at a complete loss with how to handle categorical data. I read the article http://visualstudiomagazine.com/articles/2013/07/01/neural-network-data-normalization-and-encoding.aspx, which explains normalization of the input data and explains how to preprocess categorical data using effects encoding. I understand the concept of breaking the categories into vectors, but have no idea how to actually implement this.

For example, if I'm using countries as categorical data (e.g. Finland, Thailand, etc), would I process the resulting vector into a single number to be fed to a single input, or would I have a separate input for each component of the vector? Under the latter, if there are 196 different countries, that would mean I would need 196 different inputs just to process this particular piece of data. If a lot of different categorical data is being fed to the network, I can see this becoming really unwieldy very fast.

Is there something I'm missing? How exactly is categorical data mapped to neuron inputs?

like image 493
user3450211 Avatar asked Mar 22 '14 17:03

user3450211


1 Answers

Neural network inputs

As a rule of thumb: different classes and categories should have their own input signals.


Why you can't encode it with a single input

Since a neural network acts upon the input values through activation functions, a higher input value will result in a higher activation input.

A higher input value will make the neuron more likely to fire.

As long as you don't want to tell the network that Thailand is "better" than Finland then you may not encode the country input signal as InputValue(Finland) = 24, InputValue(Thailand) = 140.

How not to format the input


How it should be encoded

Each country deserves its own input signal so that they contribute equally to activating the neurons. enter image description here

like image 95
jorgenkg Avatar answered Dec 21 '22 10:12

jorgenkg