I'm trying to play around with simple time series predictions. Given number of inputs (1Min ticks) Net should attempt to predict next one. I've trained 3 nets with different settings to illustrate my problem: <img src="https://i.stack.imgur.com/F2JAw.jpg" alt="enter image description here"> On the right you can see 3 trainer MLP's - randomly named and color coded, with some training stats. On the left - plot of predictions made by those nets and actual validation data in white. This plot was made by going through each tick of validation data (white), feeding 30|4|60 (Nancy|Kathy|Wayne) previous ticks to net and plotting what it will predict on place of current tick. Multilayer perceptron's settings (Nancy|Kathy|Wayne settings): Geometry: 2x30|4|60 input nodes -> 30|4|60 hidden layer nodes -> 2 outputs Number of epochs: 10|5|10 Learning rate: 0.01 Momentum: 0.5|0.9|0.5 Nonlinearity: Rectify Loss: Squared Error It seems that with more training applied - predictions are converging in to some kind of mean line, which is not what I was expecting at all. I was expecting predictions to stand somewhat close to validation data with some margin of error. Am I picking wrong model, misunderstanding some core concepts of machine learning or doing something wrong in lasagne/theano? Quick links to most relevant (in my opinion) code parts: <ul> <li>MLP Geometry definition</li> <li>Functions compilation</li> <li>Training and validation</li> <li>Instantiating MLP</li> <li>CSV training data parsing</li> </ul> And here's full, more or less, sources: <ul> <li> Data used for training in format - date;open;high;low;close;volume - only date, high and low are used </li> <li> MLP module </li> <li>Gui module's relevant MLP interaction parts </li> </ul>

First of all, I want to commend you for usage non linear rectifying. According to what Geoffrey Hinton inventor of Boltzmann machine believe, non linear rectifier is a best feet for activities of human brain. But for other parts you've chosen I propose you to change NN architecture. For predictions of stock market you should use some recurrent NN: easiest candidates could be Elman or Jordan networks. Or you can try more complicated, like LSTM network. Another part of advice, I propose to modify what you feed in NN. In general, I recommend you to apply scaling and normalization. For example don't feed in NN raw price. Modify it in one of the following ways ( those proposals are not written in stone ): 1. feed in NN percentages of changes of price. 2. If you feed in NN 30 values, and want to predict two values, then subtract from 30 + 2 values minimums of all 32 values, and try to predict 2 values, but basing on 30. Then just add to result the minimum of 32 values. Don't feed just dates in the NN. It says to NN nothing about making prediction. Instead feed in NN date and time as categorical value. Categorical means that you transform datetime in more then one entry. For example instead of giving to NN 2016/09/10 you can consider some of the following. <ol> <li>year of trading most probably will not give any useful information. So you can omit year of trading.</li> <li>09 stands for number of month or about September. You have choice either feed in NN number of month, but I strongly recommend you make 12 inputs in NN, and in case of January give at first NN input 1, and zeros for other eleven. In this way you'll train your network to separate trading period in January from trading period in June or December. Also I propose to do categorical input of day of week in the same way. Because trading in Monday differs from trading on Friday, especially in the day of NFP. </li> <li>For hours I propose to use encoding by periods of 6 - 8 hours. It will help you to train network to take into account different trading sessions: Asia, Frankfurt, London, New-York.</li> <li>If you decide to feed in NN some indicators then for some indicators consider thermometer encoding. As usually thermometer encoding is needed for indicators like ADX.</li> </ol> According to your question in comments about how to use minimum I'll give you simplified example. Let's say you want to use for training NN following close prices for eur/usd: 1.1122, 1.1132, 1.1152, 1.1156, 1.1166, 1.1173, 1.1153, 1.1150, 1.1152, 1.1159. Instead of windows size for learning 30 I'll demonstrate learning with window size 3 ( just for simplicity sake ) and prediction window size 2. In total data used for prediction equals to 3. Output will be 2. For learning we will use first 5 values, or: 1.1122, 1.1132, 1.1152, 1.1156, 1.1166 then another 5 values or: 1.1132, 1.1152, 1.1156, 1.1166, 1.1173 In the first window minimal value is: 1.1122. Then you subtract 1.1122 from each value: 0, 0.002, 0.003, 0.0033, 0.0034. As input you feed in NN 0, 0.002, 0.003. As output from NN you expect 0.0033, 0.0034. If you want to make it learn much faster, feed in NN normalized and scaled values. Then each time you'll need to make de-normalization and de-scaling of inputs. Another way, feed in NN percentage of changes of price. Let me know if you need sample for it. And one more important piece of advice. Don't use just NN for making trading. Never!!! Better way to do it is invent some system with some percentage of success. For example 30%. Then use NN in order to increase success percentage of success to 60%. I also want to provide for you also example of thermometer encoding for some indicators. Consider ADX indicator and following examples: a.>10 >20 >30 >40 1 0 0 0 b. >10 >20 >30 >40 1 1 0 0 example a provides input in NN with ADX greater then 10. Example b provides input in NN with ADX greater then 20. You can modify thermometer encoding for providing inputs for stochastic. As usually stochastic has meaning in ranges 0 - 20, and 80 - 100 and in seldom cases in range 20 - 80. But as always you can try and see.

Simple MLP time series training yields unexpeced mean line results

Tags:

python

neural-network

deep-learning

theano

lasagne

I'm trying to play around with simple time series predictions. Given number of inputs (1Min ticks) Net should attempt to predict next one. I've trained 3 nets with different settings to illustrate my problem:

enter image description here

On the right you can see 3 trainer MLP's - randomly named and color coded, with some training stats. On the left - plot of predictions made by those nets and actual validation data in white. This plot was made by going through each tick of validation data (white), feeding 30|4|60 (Nancy|Kathy|Wayne) previous ticks to net and plotting what it will predict on place of current tick.

Multilayer perceptron's settings (Nancy|Kathy|Wayne settings):

Geometry: 2x30|4|60 input nodes -> 30|4|60 hidden layer nodes -> 2 outputs
Number of epochs: 10|5|10
Learning rate: 0.01
Momentum: 0.5|0.9|0.5
Nonlinearity: Rectify
Loss: Squared Error

It seems that with more training applied - predictions are converging in to some kind of mean line, which is not what I was expecting at all. I was expecting predictions to stand somewhat close to validation data with some margin of error.
Am I picking wrong model, misunderstanding some core concepts of machine learning or doing something wrong in lasagne/theano?

Quick links to most relevant (in my opinion) code parts:

MLP Geometry definition
Functions compilation
Training and validation
Instantiating MLP
CSV training data parsing

And here's full, more or less, sources:

Data used for training in format - date;open;high;low;close;volume - only date, high and low are used
MLP module
Gui module's relevant MLP interaction parts

239

asked Oct 06 '16 13:10

Max Yari

1 Answers

First of all, I want to commend you for usage non linear rectifying. According to what Geoffrey Hinton inventor of Boltzmann machine believe, non linear rectifier is a best feet for activities of human brain.

But for other parts you've chosen I propose you to change NN architecture. For predictions of stock market you should use some recurrent NN: easiest candidates could be Elman or Jordan networks. Or you can try more complicated, like LSTM network.

Another part of advice, I propose to modify what you feed in NN. In general, I recommend you to apply scaling and normalization. For example don't feed in NN raw price. Modify it in one of the following ways ( those proposals are not written in stone ): 1. feed in NN percentages of changes of price. 2. If you feed in NN 30 values, and want to predict two values, then subtract from 30 + 2 values minimums of all 32 values, and try to predict 2 values, but basing on 30. Then just add to result the minimum of 32 values.

Don't feed just dates in the NN. It says to NN nothing about making prediction. Instead feed in NN date and time as categorical value. Categorical means that you transform datetime in more then one entry. For example instead of giving to NN 2016/09/10 you can consider some of the following.

year of trading most probably will not give any useful information. So you can omit year of trading.
09 stands for number of month or about September. You have choice either feed in NN number of month, but I strongly recommend you make 12 inputs in NN, and in case of January give at first NN input 1, and zeros for other eleven. In this way you'll train your network to separate trading period in January from trading period in June or December. Also I propose to do categorical input of day of week in the same way. Because trading in Monday differs from trading on Friday, especially in the day of NFP.
For hours I propose to use encoding by periods of 6 - 8 hours. It will help you to train network to take into account different trading sessions: Asia, Frankfurt, London, New-York.
If you decide to feed in NN some indicators then for some indicators consider thermometer encoding. As usually thermometer encoding is needed for indicators like ADX.

According to your question in comments about how to use minimum I'll give you simplified example. Let's say you want to use for training NN following close prices for eur/usd:
1.1122, 1.1132, 1.1152, 1.1156, 1.1166, 1.1173, 1.1153, 1.1150, 1.1152, 1.1159. Instead of windows size for learning 30 I'll demonstrate learning with window size 3 ( just for simplicity sake ) and prediction window size 2.
In total data used for prediction equals to 3. Output will be 2. For learning we will use first 5 values, or:
1.1122, 1.1132, 1.1152, 1.1156, 1.1166
then another 5 values or:
1.1132, 1.1152, 1.1156, 1.1166, 1.1173
In the first window minimal value is: 1.1122.
Then you subtract 1.1122 from each value:
0, 0.002, 0.003, 0.0033, 0.0034. As input you feed in NN 0, 0.002, 0.003. As output from NN you expect 0.0033, 0.0034. If you want to make it learn much faster, feed in NN normalized and scaled values. Then each time you'll need to make de-normalization and de-scaling of inputs.

Another way, feed in NN percentage of changes of price. Let me know if you need sample for it.

And one more important piece of advice. Don't use just NN for making trading. Never!!! Better way to do it is invent some system with some percentage of success. For example 30%. Then use NN in order to increase success percentage of success to 60%.

I also want to provide for you also example of thermometer encoding for some indicators. Consider ADX indicator and following examples:

a.>10 >20 >30 >40
1 0 0 0
b. >10 >20 >30 >40
1 1 0 0
example a provides input in NN with ADX greater then 10. Example b provides input in NN with ADX greater then 20.
You can modify thermometer encoding for providing inputs for stochastic. As usually stochastic has meaning in ranges 0 - 20, and 80 - 100 and in seldom cases in range 20 - 80. But as always you can try and see.

168

answered Nov 15 '22 03:11

Yuriy Zaletskyy

Related questions
                            
                                Zero sum game 16 bit version
                            
                                Pygame: allow clicks to go through the window
                            
                                Embedding lookup table doesn't mask padding value
                            
                                How to detect which variable is 'nonetype' in tensorflow
                            
                                Type annotate a function parameter as derived from multiple abstract base classes in Python
                            
                                Cannot import from Python six library
                            
                                Improving performance of numpy mapping operation
                            
                                Angle brackets in Python [duplicate]
                            
                                should the return value for `__enter__` method always be `self` in python
                            
                                Python: Do something then sleep, repeat
                            
                                Scripting in logstash
                            
                                How to watch xvfb session that's inside a docker on remote server from my local browser?
                            
                                Collection comparison is reflexive, yet does not short circuit. Why?
                            
                                Python3 src encodings of Emojis
                            
                                traceback shows only one line of a multiline command
                            
                                Moving x-axis in matplotlib during real time plot (python)
                            
                                What are valid values for platforms in python setup.py?
                            
                                Plotly + iPython Notebook - Plots Disappear on Reopen
                            
                                Pandas to_sql() performance - why is it so slow?
                            
                                Use context manager as a function

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With