Based on PyBrain's tutorials I managed to knock together the following code:
#!/usr/bin/env python2
# coding: utf-8
from pybrain.structure import FeedForwardNetwork, LinearLayer, SigmoidLayer, FullConnection
from pybrain.datasets import SupervisedDataSet
from pybrain.supervised.trainers import BackpropTrainer
n = FeedForwardNetwork()
inLayer = LinearLayer(2)
hiddenLayer = SigmoidLayer(3)
outLayer = LinearLayer(1)
n.addInputModule(inLayer)
n.addModule(hiddenLayer)
n.addOutputModule(outLayer)
in_to_hidden = FullConnection(inLayer, hiddenLayer)
hidden_to_out = FullConnection(hiddenLayer, outLayer)
n.addConnection(in_to_hidden)
n.addConnection(hidden_to_out)
n.sortModules()
ds = SupervisedDataSet(2, 1)
ds.addSample((0, 0), (0,))
ds.addSample((0, 1), (1,))
ds.addSample((1, 0), (1,))
ds.addSample((1, 1), (0,))
trainer = BackpropTrainer(n, ds)
# trainer.train()
trainer.trainUntilConvergence()
print n.activate([0, 0])[0]
print n.activate([0, 1])[0]
print n.activate([1, 0])[0]
print n.activate([1, 1])[0]
It's supposed to learn XOR function, but the results seem quite random:
0.208884929522
0.168926515771
0.459452834043
0.424209192223
or
0.84956138664
0.888512762786
0.564964077401
0.611111147862
Strictly speaking, a neural network (also called an “artificial neural network”) is a type of machine learning model that is usually used in supervised learning.
The neural network consists of three layers: an input layer, i; a hidden layer, j; and an output layer, k.
The standard method for training neural networks is the method of stochastic gradient descent (SGD).
Neural networks generally perform supervised learning tasks, building knowledge from data sets where the right answer is provided in advance. The networks then learn by tuning themselves to find the right answer on their own, increasing the accuracy of their predictions.
There are four problems with your approach, all easy to identify after reading Neural Network FAQ:
Why use a bias/threshold?: you should add a bias node. Lack of bias makes the learning very limited: the separating hyperplane represented by the network can only pass through the origin. With the bias node, it can move freely and fit the data better:
bias = BiasUnit()
n.addModule(bias)
bias_to_hidden = FullConnection(bias, hiddenLayer)
n.addConnection(bias_to_hidden)
Why not code binary inputs as 0 and 1?: all your samples lay in a single quadrant of the sample space. Move them to be scattered around the origin:
ds = SupervisedDataSet(2, 1)
ds.addSample((-1, -1), (0,))
ds.addSample((-1, 1), (1,))
ds.addSample((1, -1), (1,))
ds.addSample((1, 1), (0,))
(Fix the validation code at the end of your script accordingly.)
trainUntilConvergence
method works using validation, and does something that resembles the early stopping method. This doesn't make sense for such a small dataset. Use trainEpochs
instead. 1000
epochs is more than enough for this problem for the network to learn:
trainer.trainEpochs(1000)
What learning rate should be used for backprop?: Tune the learning rate parameter. This is something you do every time you employ a neural network. In this case, the value 0.1
or even 0.2
dramatically increases the learning speed:
trainer = BackpropTrainer(n, dataset=ds, learningrate=0.1, verbose=True)
(Note the verbose=True
parameter. Observing how the error behaves is essential when tuning parameters.)
With these fixes I get consistent, and correct results for the given network with the given dataset, and error less than 1e-23
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With