Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

No. of hidden layers, units in hidden layers and epochs till Neural Network starts behaving acceptable on Training data

I am trying to solve this Kaggle Problem using Neural Networks. I am using Pybrain Python Library.

It's a classical supervised Learning Problem. In following code: 'data' variable is numpy array(892*8). 7 fields are my features and 1 field is my output value which can be '0' or '1'.

from pybrain.datasets import ClassificationDataSet
from pybrain.supervised.trainers import BackpropTrainer
from pybrain.tools.shortcuts import buildNetwork

dataset = ClassificationDataSet(7,1)
for i in data:
    dataset.appendLinked(i[1:],i[0])
net = buildNetwork(7,9,7,1, bias = True,hiddenclass = SigmoidLayer, outclass = TanhLayer)
trainer = BackpropTrainer(net, learningrate = 0.04, momentum = 0.96, weightdecay = 0.02, verbose = True)
trainer.trainOnDataset(dataset, 8000)
trainer.testOnData(verbose = True)

After training my Neural Network, when I am testing it on Training Data, its always giving a single output for all inputs. Like:

Testing on data:
out:     [  0.075]
correct: [  1.000]
error:  0.42767858
out:     [  0.075]
correct: [  0.000]
error:  0.00283875
out:     [  0.075]
correct: [  1.000]
error:  0.42744569
out:     [  0.077]
correct: [  1.000]
error:  0.42616996
out:     [  0.076]
correct: [  0.000]
error:  0.00291185
out:     [  0.076]
correct: [  1.000]
error:  0.42664586
out:     [  0.075]
correct: [  1.000]
error:  0.42800026
out:     [  0.076]
correct: [  1.000]
error:  0.42719380
out:     [  0.076]
correct: [  0.000]
error:  0.00286796
out:     [  0.076]
correct: [  0.000]
error:  0.00286642
out:     [  0.076]
correct: [  1.000]
error:  0.42696969
out:     [  0.076]
correct: [  0.000]
error:  0.00292401
out:     [  0.074]
correct: [  0.000]
error:  0.00274975
out:     [  0.076]
correct: [  0.000]
error:  0.00286129

I have tried altering learningRate, weightDecay, momentum, number of hidden units, number of hidden layers, class of hidden layers, class of output layers so as resolve it, but in every case it gives same output for every input if input comes from Training Data.

I think I should run it more than 8000 times because when I was building Neural Network for 'XOR', It took atleast 700 iterations before it started giving errors on nano scale. Training data size on 'XOR' was only 4 whereas in this case it is 892. So I ran 8000 iterations on 10 % of the original data(Now size of Training Data is 89), even then it was giving same output for every input in Training Data. And since I want to classify input into '0' or '1', if I'm using class of Output Layer to be Softmax, then it is always giving '1' as output.

No matter which configuration(no. of hidden units, class of output layer, learning rate, class of hidden layer, momentum), was I using in 'XOR', it more or less started converging in every case.

Is is possible that there is some configuration that will finally yield lower error rates. Atleast some configuration so that it won't give same output for all inputs in Training Data.

I ran it for 80,000 iteration(Training Data Size is 89). Output Sample:

Testing on data:
out:     [  0.340]
correct: [  0.000]
error:  0.05772102
out:     [  0.399]
correct: [  0.000]
error:  0.07954010
out:     [  0.478]
correct: [  1.000]
error:  0.13600274
out:     [  0.347]
correct: [  0.000]
error:  0.06013008
out:     [  0.500]
correct: [  0.000]
error:  0.12497886
out:     [  0.468]
correct: [  1.000]
error:  0.14177601
out:     [  0.377]
correct: [  0.000]
error:  0.07112816
out:     [  0.349]
correct: [  0.000]
error:  0.06100758
out:     [  0.380]
correct: [  1.000]
error:  0.19237095
out:     [  0.362]
correct: [  0.000]
error:  0.06557341
out:     [  0.335]
correct: [  0.000]
error:  0.05607577
out:     [  0.381]
correct: [  0.000]
error:  0.07247926
out:     [  0.355]
correct: [  1.000]
error:  0.20832669
out:     [  0.382]
correct: [  1.000]
error:  0.19116165
out:     [  0.440]
correct: [  0.000]
error:  0.09663233
out:     [  0.336]
correct: [  0.000]
error:  0.05632861

Average error: 0.112558819082

('Max error:', 0.21803000849096299, 'Median error:', 0.096632332865968451)

It's giving all outputs within range(0.33, 0.5).

like image 581
Jack Smith Avatar asked Oct 08 '12 05:10

Jack Smith


1 Answers

There is yet another neural network metric, which you did not mention - number of adaptable weights. I'm starting the answer from this because it's related to the numbers of hidden layers and units in them.

For good generalization, number of weights must be much less Np/Ny, where Np is a number of patterns and Ny is a number of net outputs. What is the "much" exactly is discussible, I suggest several times difference, say 10. For approximately 1000 patterns and 1 output in your task this will imply 100 weights.

It does not make sense to use 2 hidden layers. 1 is sufficient for most of tasks where non-linearity involved. In your case, the additional hidden layer makes only the difference by impacting overall perfomance. So if 1 hidden layer is used, number of neurons in it can be approximated as number of weights divided by number of inputs, that is 100/7 = 14.

I suggest to use the same activation function in all neurons, either Hypertanh or Sigmoid everywhere. Your output values are actually already normalized for Sigmoid. Anyway, you can improve NN performance by input data normalization to fit into [0,1] in all dimentions. Of course, normalize each feature on its own.

If you can do with the Pybrain lib, start learning with greater learning rate and then decrease it smoothly proportional to current step (LR * (N - i)/N), where i is current step, N - is a limit, LR - initial learning rate.

As @Junuxx suggested, output current error every M steps (if this possible) just to make sure your program works as expected. Stop learning if the difference in errors in successive steps becomes less than a threshold. Just for beginning and rough estimation of the proper NN parameters choosing set the threshold to 0.1-0.01 (there is no need in "nano scale").

The fact of running a network on 89 patterns in 80000 steps and getting the results your have is strange. Please, double check you pass correct data to the NN, and please examine what does the error values you provided mean. Possibly, either the errors, or outputs displayed are taken from wrong place. I think 10000 steps must be far enough to get acceptable results for 89 patters.

As for the specific task, I think SOM net could be another option (possily better suited than BP).

As a sidenote, I'm not familiar with Pybrain, but have coded some NNs in C++ and other languages, so your timing looks highly outsized.

like image 131
Stan Avatar answered Nov 08 '22 17:11

Stan