Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

how to implement walk forward testing in sklearn?

In sklearn, GridSearchCV can take a pipeline as a parameter to find the best estimator through cross validation. However, the usual cross validation is like this:enter image description here

to cross validate a time series data, the training and testing data are often splitted like this:enter image description here

That is to say, the testing data should be always ahead of training data.

My thought is:

  1. Write my own version class of k-fold and passing it to GridSearchCV so I can enjoy the convenience of pipeline. The problem is that it seems difficult to let GridSearchCV to use an specified indices of training and testing data.

  2. Write a new class GridSearchWalkForwardTest which is similar to GridSearchCV, I am studying the source code grid_search.py and find it is a little complicated.

Any suggestion is welcome.

like image 991
PhilChang Avatar asked Aug 11 '15 16:08

PhilChang


People also ask

What is walk forward testing?

Forward testing (also known as Walk forward testing) is the simulation of the real markets' data on paper only. One moves along the markets live and is not using real money, but virtually trading in the markets to understand their movements better. Hence, it is also called Paper Trading.

What is Timeseriessplit in Python?

Provides train/test indices to split time series data samples that are observed at fixed time intervals, in train/test sets. In each split, test indices must be higher than before, and thus shuffling in cross validator is inappropriate. This cross-validation object is a variation of KFold .


1 Answers

I think you could use a Time Series Split either instead of your own implementation or as a basis for implementing a CV method which is exactly as you describe it.

After digging around a bit, it seems like someone added a max_train_size to the TimeSeriesSplit in this PR which seems like it does what you want.

like image 140
Matthijs Brouns Avatar answered Sep 27 '22 21:09

Matthijs Brouns