Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python - how to normalize time-series data

I have a dataset of time-series examples. I want to calculate the similarity between various time-series examples, however I do not want to take into account differences due to scaling (i.e. I want to look at similarities in the shape of the time-series, not their absolute value). So, to this end, I need a way of normalizing the data. That is, making all of the time-series examples fall between a certain region e.g [0,100]. Can anyone tell me how this can be done in python

like image 602
user1893354 Avatar asked Oct 08 '13 19:10

user1893354


People also ask

How do you normalize a time series?

We can use a rescaling method called “normalization” to put every variable on the same scale. First, we calculate the mean and standard deviation for the original variables (Table 2). To get the rescaled value we subtract the mean from the original value and then divide by the standard deviation.

How do you normalize a dataset in Python?

Using MinMaxScaler() to Normalize Data in Python This is a more popular choice for normalizing datasets. You can see that the values in the output are between (0 and 1). MinMaxScaler also gives you the option to select feature range. By default, the range is set to (0,1).


2 Answers

Assuming that your timeseries is an array, try something like this:

(timeseries-timeseries.min())/(timeseries.max()-timeseries.min())

This will confine your values between 0 and 1

like image 168
Trond Kristiansen Avatar answered Oct 27 '22 01:10

Trond Kristiansen


The solutions given are good for a series that aren’t incremental nor decremental(stationary). In financial time series( or any other series with a a bias) the formula given is not right. It should, first be detrended or perform a scaling based in the latest 100-200 samples.
And if the time series doesn't come from a normal distribution ( as is the case in finance) there is advisable to apply a non linear function ( a standard CDF funtion for example) to compress the outliers.
Aronson and Masters book (Statistically sound Machine Learning for algorithmic trading) uses the following formula ( on 200 day chunks ):

V = 100 * N ( 0.5( X -F50)/(F75-F25)) -50

Where:
X : data point
F50 : mean of the latest 200 points
F75 : percentile 75
F25 : Percentile 25
N : normal CDF

like image 43
FSRubyc Avatar answered Oct 27 '22 01:10

FSRubyc