Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Neural Network not learning - MNIST data - Handwriting recognition

Tags:

I have written a Neural Network Program. It works for Logic Gates, but when I try to use it for recognizing handwritten digits - it simply does not learn.

Please find the code below:

// This is a single neuron; this might be necessary in order to understand remaining code

typedef struct SingleNeuron {     double                  outputValue;     std::vector<double>     weight;     std::vector<double>     deltaWeight;     double                  gradient;     double                  sum; }SingleNeuron; 

Then I initialize the net. I set weights to be random value between -0.5 to +0.5, sum to 0, deltaWeight to 0

Then comes the FeedForward:

for (unsigned i = 0; i < inputValues.size(); ++i) {     neuralNet[0][i].outputValue = inputValues[i];     neuralNet[0][i].sum = 0.0;     //  std::cout << "o/p Val = " << neuralNet[0][i].outputValue << std::endl; }  for (unsigned i = 1; i < neuralNet.size(); ++i) {     std::vector<SingleNeuron> prevLayerNeurons = neuralNet[i - 1];     unsigned j = 0;     double thisNeuronOPVal = 0;     //  std::cout << std::endl;     for (j = 0; j < neuralNet[i].size() - 1; ++j)     {         double sum = 0;         for (unsigned k = 0; k < prevLayerNeurons.size(); ++k)         {             sum += prevLayerNeurons[k].outputValue * prevLayerNeurons[k].weight[j];         }         neuralNet[i][j].sum = sum;         neuralNet[i][j].outputValue = TransferFunction(sum);         //      std::cout << neuralNet[i][j].outputValue << "\t";     }     //      std::cout << std::endl; } 

My transfer function and its derivative is mentioned at the end.

After this I try to back-propagate using:

// calculate output layer gradients for (unsigned i = 0; i < outputLayer.size() - 1; ++i) {     double delta = actualOutput[i] - outputLayer[i].outputValue;     outputLayer[i].gradient = delta * TransferFunctionDerivative(outputLayer[i].sum); } //  std::cout << "Found Output gradients "<< std::endl; // calculate hidden layer gradients for (unsigned i = neuralNet.size() - 2; i > 0; --i) {     std::vector<SingleNeuron>& hiddenLayer = neuralNet[i];     std::vector<SingleNeuron>& nextLayer = neuralNet[i + 1];      for (unsigned j = 0; j < hiddenLayer.size(); ++j)     {         double dow = 0.0;         for (unsigned k = 0; k < nextLayer.size() - 1; ++k)         {             dow += nextLayer[k].gradient * hiddenLayer[j].weight[k];         }         hiddenLayer[j].gradient = dow * TransferFunctionDerivative(hiddenLayer[j].sum);     } } //  std::cout << "Found hidden layer gradients "<< std::endl;  // from output to 1st hidden layer, update all weights for (unsigned i = neuralNet.size() - 1; i > 0; --i) {     std::vector <SingleNeuron>& currentLayer = neuralNet[i];     std::vector <SingleNeuron>& prevLayer = neuralNet[i - 1];      for (unsigned j = 0; j < currentLayer.size() - 1; ++j)     {         for (unsigned k = 0; k < prevLayer.size(); ++k)         {             SingleNeuron& thisNeueon = prevLayer[k];             double oldDeltaWeight = thisNeueon.deltaWeight[j];             double newDeltaWeight = ETA * thisNeueon.outputValue * currentLayer[j].gradient + (ALPHA * oldDeltaWeight);             thisNeueon.deltaWeight[j] = newDeltaWeight;             thisNeueon.weight[j] += newDeltaWeight;         }     } } 

These are the TransferFuntion and its derivative;

double TransferFunction(double x) {     double val;     //val = tanh(x);     val = 1 / (1 + exp(x * -1));     return val; }  double TransferFunctionDerivative(double x) {     //return 1 - x * x;     double val = exp(x * -1) / pow((exp(x * -1) + 1), 2);     return val; } 

One thing I observed If i use standard sigmoid function to be my transfer function AND if I pass output of neuron to transfer function - Result is INFINITY. But tanh(x) works fine with this value

So if I am using 1/1+e^(-x) as transfer function I have to pass Sum of Net Inputs and with tanh being my transfer function I have to pass output of current neuron.

I do not completely understand why this is the way it is, may be this calls for a different question.

But this question is really about something else: NETWORK IS WORKING FOR LOGIC GATES BUT NOT FOR CHARACTER RECOGNITION

I have tried many variations/combinations of Learning Rate and Acceleration and # hidden layers and their sizes. Please find the results below:

AvgErr: 0.299399          #Pass799 AvgErr : 0.305071         #Pass809 AvgErr : 0.303046         #Pass819 AvgErr : 0.299569         #Pass829 AvgErr : 0.30413          #Pass839 AvgErr : 0.304165         #Pass849 AvgErr : 0.300529         #Pass859 AvgErr : 0.302973         #Pass869 AvgErr : 0.299238         #Pass879 AvgErr : 0.304708         #Pass889 AvgErr : 0.30068          #Pass899 AvgErr : 0.302582         #Pass909 AvgErr : 0.301767         #Pass919 AvgErr : 0.303167         #Pass929 AvgErr : 0.299551         #Pass939 AvgErr : 0.301295         #Pass949 AvgErr : 0.300651         #Pass959 AvgErr : 0.297867         #Pass969 AvgErr : 0.304221         #Pass979 AvgErr : 0.303702         #Pass989 

After looking at the results you might feel this guy is simply stuck into local minima, but please wait and read through:

Input = [0, 0, 0, 0, 0, 0, 1, 0, 0, 0]           Output = 0.0910903, 0.105674, 0.064575, 0.0864824, 0.128682, 0.0878434, 0.0946296, 0.154405, 0.0678767, 0.0666924  Input = [1, 0, 0, 0, 0, 0, 0, 0, 0, 0] Output = 0.0916106, 0.105958, 0.0655508, 0.086579, 0.126461, 0.0884082, 0.110953, 0.163343, 0.0689315, 0.0675822  Input = [0, 0, 0, 1, 0, 0, 0, 0, 0, 0]           Output = 0.105344, 0.105021, 0.0659517, 0.0858077, 0.123104, 0.0884107, 0.116917, 0.161911, 0.0693426, 0.0675156  Input = [0, 0, 0, 0, 0, 0, 1, 0, 0, 0]           Output = , 0.107113, 0.101838, 0.0641632, 0.0967766, 0.117149, 0.085271, 0.11469, 0.153649, 0.0672772, 0.0652416 

Above is the output of epoch #996, #997,#998 and #999

So simply network is not learning. For this e.g. I have used ALPHA = 0.4, ETA = 0.7, 10 hidden layers each of 100 neurons and average is over 10 epochs. If you are worried about Learning Rate being 0.4 or so many hidden layers I have already tried their variations. For e.g. for learning rate being 0.1 and 4 hidden layers - each of 16

Input = [0, 0, 0, 0, 0, 0, 1, 0, 0, 0]           Output = 0.0883238, 0.0983253, 0.0613749, 0.0809751, 0.124972, 0.0897194, 0.0911235, 0.179984, 0.0681346, 0.0660039  Input = [1, 0, 0, 0, 0, 0, 0, 0, 0, 0]           Output = 0.0868767, 0.0966924, 0.0612488, 0.0798343, 0.120353, 0.0882381, 0.111925, 0.169309, 0.0676711, 0.0656819  Input = [0, 0, 0, 1, 0, 0, 0, 0, 0, 0]           Output = 0.105252, 0.0943837, 0.0604416, 0.0781779, 0.116231, 0.0858496, 0.108437, 0.1588, 0.0663156, 0.0645477  Input = [0, 0, 0, 0, 0, 0, 1, 0, 0, 0]           Output = 0.102023, 0.0914957, 0.059178, 0.09339, 0.111851, 0.0842454, 0.104834, 0.149892, 0.0651799, 0.063558 

I am so damn sure that I have missed something. I am not able to figure it out. I have read Tom Mitchel's algorithm so many times, but I don't know what is wrong. Whatever example I solve by hand - works! (Please don't ask me to solve MNIST data images by hand ;) ) I do not know where to change the code, what to do.. please help out..

EDIT -- Uploading more data as per suggestions in comments

1 Hidden Layer of 32 -- still no learning.

Expected Output -- Input is images between 0-9, so a simple vector describing which is current image, that bit is 1 all others are 0. So i would want output to be as close to 1 for that particular bit and others being close to 0 For e.g. if input is Input = [0, 0, 0, 0, 0, 0, 1, 0, 0, 0] I would want output to be something like Output = 0.002023, 0.0914957, 0.059178, 0.09339, 0.011851, 0.0842454, 0.924834, 0.049892, 0.0651799, 0.063558 (THis is vague, hand-generated)

Here are the links of other researcher's work.

Stanford

SourceForge -- This is rather a library

Not only these 2, there are so many sites showing the demos.

Things are working quite fine for them. If I set my network parameters(Alpha, ETA) like them I am not getting results like them, so this is reassurance that something is wrong with my code.

EDIT 2

Adding more failure cases

Accelaration - 0.7, Learning Rate 0.1

Accelaration - 0.7, Learning Rate 0.6

In both of the above cases Hidden layers were 3, each of 32 neurons.

like image 201
Adorn Avatar asked Feb 26 '15 15:02

Adorn


1 Answers

This answer is copied from the OP's comment on the question.

I solved the puzzle. I had made the worst possible mistake. I was giving wrong input. I have used opencv to scan the images, instead of using reshape I was using resize and so input was linear interpolation of images. So my input was wrong. There was nothing wrong with the code. My network is 784 - 65 - 10 giving 96.43% accuracy.

like image 123
2 revs, 2 users 60% Avatar answered Oct 21 '22 07:10

2 revs, 2 users 60%