How to predict time series in scikit-learn?

Tags:

Scikit-learn utilizes a very convenient approach based on fit and predict methods. I have time-series data in the format suited for fit and predict.

For example I have the following Xs:

[[1.0, 2.3, 4.5], [6.7, 2.7, 1.2], ..., [3.2, 4.7, 1.1]]

and the corresponding ys:

[[1.0], [2.3], ..., [7.7]]

These data have the following meaning. The values stored in ys form a time series. The values in Xs are corresponding time dependent "factors" that are known to have some influence on the values in ys (for example: temperature, humidity and atmospheric pressure).

Now, of course, I can use fit(Xs,ys). But then I get a model in which future values in ys depend only on factors and do not dependend on the previous Y values (at least directly) and this is a limitation of the model. I would like to have a model in which Y_n depends also on Y_{n-1} and Y_{n-2} and so on. For example I might want to use an exponential moving average as a model. What is the most elegant way to do it in scikit-learn

ADDED

As it has been mentioned in the comments, I can extend Xs by adding ys. But this way has some limitations. For example, if I add the last 5 values of y as 5 new columns to X, the information about time ordering of ys is lost. For example, there is no indication in X that values in the 5th column follows value in the 4th column and so on. As a model, I might want to have a linear fit of the last five ys and use the found linear function to make a prediction. But if I have 5 values in 5 columns it is not so trivial.

ADDED 2

To make my problem even more clear, I would like to give one concrete example. I would like to have a "linear" model in which y_n = c + k1*x1 + k2*x2 + k3*x3 + k4*EMOV_n, where EMOV_n is just an exponential moving average. How, can I implement this simple model in scikit-learn?

980

asked Dec 30 '13 14:12

Roman

1 Answers

According to Wikipedia, EWMA works well with stationary data, but it does not work as expected in the presence of trends, or seasonality. In those cases you should use a second or third order EWMA method, respectively. I decided to look at the pandas ewma function to see how it handled trends, and this is what I came up with:

import pandas, numpy as np ewma = pandas.stats.moments.ewma  # make a hat function, and add noise x = np.linspace(0,1,100) x = np.hstack((x,x[::-1])) x += np.random.normal( loc=0, scale=0.1, size=200 ) plot( x, alpha=0.4, label='Raw' )  # take EWMA in both directions with a smaller span term fwd = ewma( x, span=15 )          # take EWMA in fwd direction bwd = ewma( x[::-1], span=15 )    # take EWMA in bwd direction c = np.vstack(( fwd, bwd[::-1] )) # lump fwd and bwd together c = np.mean( c, axis=0 )          # average    # regular EWMA, with bias against trend plot( ewma( x, span=20 ), 'b', label='EWMA, span=20' )  # "corrected" (?) EWMA plot( c, 'r', label='Reversed-Recombined' )  legend(loc=8) savefig( 'ewma_correction.png', fmt='png', dpi=100 )

enter image description here

As you can see, the EWMA bucks the trend uphill and downhill. We can correct for this (without having to implement a second-order scheme ourselves) by taking the EWMA in both directions and then averaging. I hope your data was stationary!

answered Sep 28 '22 03:09

cjohnson318

Related questions
                            
                                SQLAlchemy: Hybrid expression with relationship
                            
                                Can I write native iPhone, Android, Windows, Blackberry apps using Python? [duplicate]
                            
                                Return results from multiple models with Django REST Framework
                            
                                Why isn't __new__ in Python new-style classes a class method?
                            
                                Plug in django-allauth as endpoint in django-rest-framework
                            
                                Difference between different ways to create celery task
                            
                                Flask App: Update progress bar while function runs
                            
                                Specifying dtype float32 with pandas.read_csv on pandas 0.10.1
                            
                                Is there a Python language specification?
                            
                                Searching for a string in a large text file - profiling various methods in python
                            
                                Has anyone used Sphinx to document a C++ project? [closed]
                            
                                matplotlib legend location numbers
                            
                                Project design / FS layout for large django projects [closed]
                            
                                Python: efficiently check if integer is within *many* ranges
                            
                                Pandas: Assigning multiple *new* columns simultaneously
                            
                                Prevent Python packages from re-exporting imported names
                            
                                Which is the best IDE for Python For Windows [duplicate]
                            
                                Why is `continue` not allowed in a `finally` clause in Python?
                            
                                Are sets ordered like dicts in python3.6
                            
                                Passing variables to a subprocess call [duplicate]

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to predict time series in scikit-learn?

Tags:

python

machine-learning

time-series

scikit-learn

Roman

People also ask

1 Answers

cjohnson318

Recent Activity

Donate For Us