Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Should I normalize my features before throwing them into RNN?

I am playing some demos about recurrent neural network.

I noticed that the scale of my data in each column differs a lot. So I am considering to do some preprocess work before I throw data batches into my RNN. The close column is the target I want to predict in the future.

     open   high    low     volume  price_change  p_change     ma5    ma10  \
0  20.64  20.64  20.37  163623.62         -0.08     -0.39  20.772  20.721
1  20.92  20.92  20.60  218505.95         -0.30     -1.43  20.780  20.718
2  21.00  21.15  20.72  269101.41         -0.08     -0.38  20.812  20.755
3  20.70  21.57  20.70  645855.38          0.32      1.55  20.782  20.788
4  20.60  20.70  20.20  458860.16          0.10      0.48  20.694  20.806

     ma20      v_ma5     v_ma10     v_ma20  close
0  20.954  351189.30  388345.91  394078.37  20.56
1  20.990  373384.46  403747.59  411728.38  20.64
2  21.022  392464.55  405000.55  426124.42  20.94
3  21.054  445386.85  403945.59  473166.37  21.02
4  21.038  486615.13  378825.52  461835.35  20.70

My question is, is preprocessing the data with, say StandardScaler in sklearn necessary in my case? And why?

(You are welcome to edit my question)

like image 958
Kulbear Avatar asked Apr 18 '17 08:04

Kulbear


3 Answers

It will be beneficial to normalize your training data. Having different features with widely different scales fed to your model will cause the network to weight the features not equally. This can cause a falsely prioritisation of some features over the others in the representation.

Despite that the whole discussion on data preprocessing is controversial either on when exactly it is necessary and how to correctly normalize the data for each given model and application domain there is a general consensus in Machine Learning that running a Mean subtraction as well as a general Normalization preprocessing step is helpful.

In the case of Mean subtraction, the mean of every individual feature is being subtracted from the data which can be interpreted as centering the data around the origin from a geometric point of view. This is true for every dimensionality.

Normalizing the data after the Mean subtraction step results in a normalization of the data dimensionality to approximately the same scale. Note that the different features will loose any prioritization over each other after this step as mentioned above. If you have good reasons to think that the different scales in your features bear important information that the network may need to truly understand the underlying patterns in your dataset, then a normalization will be harmful. A standard approach would be to scale the inputs to have mean of 0 and a variance of 1.

Further preprocessing operations may be helpful in specific cases such as performing PCA or Whitening on your data. Look into the awesome notes of CS231n (Setting up the data and the model) for further reference on these topics as well as for a more detailed explenation of the topics above.

like image 43
Kevin Katzke Avatar answered Dec 07 '22 23:12

Kevin Katzke


I found this https://arxiv.org/abs/1510.01378 If you normalize it may improve convergence so you will get lower training times.

like image 22
Cristi Avatar answered Dec 07 '22 23:12

Cristi


Definetly yes. Most of neural networks work best with data beetwen 0-1 or -1 to 1(depends on output function). Also when some inputs are higher then others network will "think" they are more important. This can make learning very long. Network must first lower weights in this inputs.

like image 176
TakMashido Avatar answered Dec 08 '22 01:12

TakMashido