I am trying to use 5 years of consecutive, historical data to forecast values for the following year.
My input data input_04_08 looks like this where the first column is the day of the year (1 to 365) and the second column is the recorded input.
1,2
2,2
3,0
4,0
5,0
My output data output_04_08 looks like this, a single column with the recorded output on that day of the year.
27.6
28.9
0
0
0
I then normalise the values between 0 and 1 so the first sample given to the network would look like
Number of training patterns: 1825
Input and output dimensions: 2 1
First sample (input, target):
[ 0.00273973 0.04 ] [ 0.02185273]
I have implemented the following code in PyBrain
input_04_08 = numpy.loadtxt('./data/input_04_08.csv', delimiter=',')
input_09 = numpy.loadtxt('./data/input_09.csv', delimiter=',')
output_04_08 = numpy.loadtxt('./data/output_04_08.csv', delimiter=',')
output_09 = numpy.loadtxt('./data/output_09.csv', delimiter=',')
input_04_08 = input_04_08 / input_04_08.max(axis=0)
input_09 = input_09 / input_09.max(axis=0)
output_04_08 = output_04_08 / output_04_08.max(axis=0)
output_09 = output_09 / output_09.max(axis=0)
ds = SupervisedDataSet(2, 1)
for x in range(0, 1825):
ds.addSample(input_04_08[x], output_04_08[x])
n = FeedForwardNetwork()
inLayer = LinearLayer(2)
hiddenLayer = TanhLayer(25)
outLayer = LinearLayer(1)
n.addInputModule(inLayer)
n.addModule(hiddenLayer)
n.addOutputModule(outLayer)
in_to_hidden = FullConnection(inLayer, hiddenLayer)
hidden_to_out = FullConnection(hiddenLayer, outLayer)
n.addConnection(in_to_hidden)
n.addConnection(hidden_to_out)
n.sortModules()
trainer = BackpropTrainer(n, ds, learningrate=0.01, momentum=0.1)
for epoch in range(0, 100000000):
if epoch % 10000000 == 0:
error = trainer.train()
print 'Epoch: ', epoch
print 'Error: ', error
result = numpy.array([n.activate(x) for x in input_09])
and this gave me the following result with final error of 0.00153840123381
Admittedly, this looks good. However, having read more about LSTM (Long Short-Term Memory) neural networks, and there applicability to time series data, I am trying to build one.
Below is my code
input_04_08 = numpy.loadtxt('./data/input_04_08.csv', delimiter=',')
input_09 = numpy.loadtxt('./data/input_09.csv', delimiter=',')
output_04_08 = numpy.loadtxt('./data/output_04_08.csv', delimiter=',')
output_09 = numpy.loadtxt('./data/output_09.csv', delimiter=',')
input_04_08 = input_04_08 / input_04_08.max(axis=0)
input_09 = input_09 / input_09.max(axis=0)
output_04_08 = output_04_08 / output_04_08.max(axis=0)
output_09 = output_09 / output_09.max(axis=0)
ds = SequentialDataSet(2, 1)
for x in range(0, 1825):
ds.newSequence()
ds.appendLinked(input_04_08[x], output_04_08[x])
fnn = buildNetwork( ds.indim, 25, ds.outdim, hiddenclass=LSTMLayer, bias=True, recurrent=True)
trainer = BackpropTrainer(fnn, ds, learningrate=0.01, momentum=0.1)
for epoch in range(0, 10000000):
if epoch % 100000 == 0:
error = trainer.train()
print 'Epoch: ', epoch
print 'Error: ', error
result = numpy.array([fnn.activate(x) for x in input_09])
This results in a final error of 0.000939719502501, but this time, when I feed the test data, the output plot looks terrible.
I have looked around here at pretty much all the PyBrain questions, these stood out, but haven't helped me figure things out
I have read a few blog posts, these helped further my understanding a bit, but obviously not enough
Naturally, I have also gone through the PyBrain docs but couldn't find much to help with the sequential dataset bar here.
Any ideas/tips/direction would be welcome.
Recurrent Neural Networks are the most popular Deep Learning technique for Time Series Forecasting since they allow to make reliable predictions on time series in many different problems. The main problem with RNNs is that they suffer from the vanishing gradient problem when applied to long sequences.
Convolutional Neural Networks (CNNs) Convolutional Neural Networks or CNNs are a type of neural network that was designed to efficiently handle image data. The ability of CNNs to learn and automatically extract features from raw input data can be applied to time series forecasting problems.
The sequential memory fails when the recurrent neural network uses sequential data recorded over a long time, for example, a time series recorded for many years. The RNN will not remember the information for a long time and therefore lose track of the order of the inputs and how the data points depend on each other.
AR-Net: A simple autoregressive neural network for time series.
I think what happened here is that you tried to assign hyperparameter values according to some rule-of-thumb which worked for the first case, but didn't for the second.
1) The error estimate that you're looking at is an optimistic prediction error estimate of the training set. The actual prediction error is high, but because you didn't test your model on unseen data there's no way of knowing it. Elements of statistical learning gives a nice description of this phenomenon. I would highly recommend this book. You can get it online for free.
2) To get an estimator with low prediction error you need to perform hyperparameter tuning. E.g. the number of hidden nodes, learning rate and momentum should be varied and tested on the unseen data to know which combination leads to the lowest prediction error. scikit-learn has GridSearchCV
and RandomizedSearchCV
to do just that, but they only work on sklearn`s estimators. You can roll your own estimator though, which is described in the documentation. Personally, I think that model selection and model evaluation are two different tasks. For the first one you can just run a single GridSearchCV or RandomizedSearchCV and get a set of best hyperparameters for your task. For model evaluation you need to run a more sophisticated analysis, such as nested cross-validation or even repeated nested cross-validation if you want an even more accurate estimate.
3) I don't know much about LSTM networks, but I see that in the first example you assigned 25 hidden nodes, but for LSTM you only provide 5. Maybe it's not enough to learn the pattern. You could also drop the output bias as done in the example.
P.S. I think this question actually belongs to http://stats.stackexchange.com, where you're likely to get a more detailed answer to your problem.
EDIT: I just noticed that you're teaching the model for 10 million epochs! I think it's a lot and probably part of the overfitting problem. I think it's a good idea to implement early stopping, i.e. stop training if some predefined error is achieved.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With