I am interested in learning about the neural networks, and as an example, I tried with the following dataset which has been derived experimentally.
I am using the following input vector for my neural network;
X = [1 1; 1 2; 1 3; 1 4; 4 1; 4 2; 4 3; 4 4; 7 1;7 2; 7 3; 7 4]';
Tc = [1 1 2 3 1 1 2 2 1 1 2 2];
I want to divide the input data into three classes, described by the input vector Tc. Then I am converting the target class indices Tc to vectors T and spread value I am using is 1.
Using newpnn function in MATLAB, I am getting the decision boundaries for the three classes.
I have a doubt regarding the validating if the decision boundary is appropriate. I am validating the output with a single data X =[2;3.55]
belongs to class 2. It is depicted by the black dot in the output plot. Blue is class 1. The yellow is the region belonging to class 2. Red is class 3.
As shown in the plot, the prediction by the neural network was found to be class 2, which coincides with the actual class of the set.
So, does that mean that my neural network is correct and validated?
P.S. I have a basic understanding of Neural Networks. Also, I understand the concept of having more training examples and validation sets. I am expecting an answer catering to the available details, as I cannot get more data experimentally.
Validation data is to "validate" that your neural network hasn't rather memorized your training data and has thus actually learned some meaningful aspects of the data so that the model can be later on used (generalized) to unseen, held-out "test dataset".
Probabilistic neural networks can be used for classification problems. When an input is presented, the first layer computes distances from the input vector to the training input vectors and produces a vector whose elements indicate how close the input is to a training input.
The PNN is based on Bayes theory and was developed in 1990 by Specht (1990). It estimates the probability of a sample being part of a learned category. There are four layers in PNN: an input layer, a pattern layer, a summation layer, and a decision layer. The PNN calculates most of the terms from the training data.
A probabilistic neural network (PNN) is a sort of feedforward neural network used to handle classification and pattern recognition problems. In the PNN technique, the parent probability distribution function (PDF) of each class is approximated using a Parzen window and a non-parametric function.
Hmm, I think that you doesn't understand well was validation is in term of neural networks. You can't check your network only with one sample. So, I'll try to teach you what I know about validating neural networks. It's a long statistical process that involves some reflexion about "real world data", "expected behaviour", ... You can't validate something with like 10-20 data and one validating point.
Generally, when you teach a neural network you should have 3 sets:
(source: https://stats.stackexchange.com/questions/19048/what-is-the-difference-between-test-set-and-validation-set)
For instance, we're building an algorithm used to check if a person "has some chance to become rich". Here's how you would make and valid your neural network.
It's how you can make a reliable neural network (don't forget, the two main problems are don't checking the final result and overfitting).
As overfitting is a key notion. I'll try to define it a bit and to give an example. Overfitting is making an algorithm that is able to build very close approximations but that aren't able to predict anything (what I called "dummy algorithm").
Let's compare for instance a linear interpolator and a polynomial (1000000th degree, very high degree) one. Our polynomial algorithm will probably fit very well the data (extreme overfitting is fitting exactly all our data). But, it won't be able to predict anything at all.
For the example below, if we have in our validation set (extracted from real world data) a point in (2,-2) and (-1,2), we can assume that our polynomial interpolation was clearly overfitted because it suggests values such as (-1,10) and (2,20). The linear one should be closer.
I hope it will help. (note that I'm not an expert in that domain but I tried to make a very readable and simple answer, so if anything is false, feel free to comment :) )
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With