Hopefully the last NN question you'll get from me this weekend, but here goes :)
Is there a way to handle an input that you "don't always know"... so it doesn't affect the weightings somehow?
Soo... if I ask someone if they are male or female and they would not like to answer, is there a way to disregard this input? Perhaps by placing it squarely in the centre? (assuming 1,0 inputs at 0.5?)
Thanks
Abstract: While data are the primary fuel for machine learning models, they often suffer from missing values, especially when collected in real-world scenarios. However, many off-the-shelf machine learning models, including artificial neural network models, are unable to handle these missing values directly.
You can train on this data (just keep the missing dimensions on zero, or try to put in the mean instead of 0.0), only it depends completely on the data if correct predictions can be made. The only way to find out is by training the neural network and evaluating it.
Brown and Kros [2003] present a comprehensive summary of the past research conducted on the topic of the impact of missing data on various data mining techniques including neural networks and pointed out that missing values can cause variance understatement, distortion of distribution, and correlation depression in the ...
In addition, in the last few years, deep learning has been extensively used in different fields, including missing data imputation, which has led to a significant improvement of the imputation performance through using a large amount of training data.
Neural networks are fairly resistant to noise - that's one of their big advantages. You may want to try putting inputs at (-1.0,1.0) instead, with 0 as the non-input input, though. That way the input to the weights from that neuron is 0.0, meaning that no learning will occur there.
Probably the best book I've ever had the misfortune of not finishing (yet!) is Neural Networks and Learning Machines by Simon S. Haykin. In it, he talks about all kinds of issues, including the way you should distribute your inputs/training set for the best training, etc. It's a really great book!
You probably know this or suspect it, but there's no statistical basis for guessing or supplying the missing values by averaging over the range of possible values, etc.
For NN in particular, there are quite a few techniques avaialble. The technique i use--that i've coded--is one of the simpler techniques, but it has a solid statistical basis and it's still used today. The academic paper that describes it here.
The theory that underlies this technique is weighted integration over the incomlete data. In practice, no integrals are evaluated, instead they are approximated by closed-form solutions of Gaussian Basis Function networks. As you'll see in the paper (which is a step-by-step explanation, it's simple to implement in your backprop algorithm.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With