In sklearn, GridSearchCV can take a pipeline as a parameter to find the best estimator through cross validation. However, the usual cross validation is like this:<img src="https://i.stack.imgur.com/1fXzJ.png" alt="enter image description here"> to cross validate a time series data, the training and testing data are often splitted like this:<img src="https://i.stack.imgur.com/padg4.gif" alt="enter image description here"> That is to say, the testing data should be always ahead of training data. My thought is: <ol> <li> Write my own version class of k-fold and passing it to GridSearchCV so I can enjoy the convenience of pipeline. The problem is that it seems difficult to let GridSearchCV to use an specified indices of training and testing data. </li> <li> Write a new class GridSearchWalkForwardTest which is similar to GridSearchCV, I am studying the source code grid_search.py and find it is a little complicated. </li> </ol> Any suggestion is welcome.

I think you could use a Time Series Split either instead of your own implementation or as a basis for implementing a CV method which is exactly as you describe it. After digging around a bit, it seems like someone added a max_train_size to the TimeSeriesSplit in this PR which seems like it does what you want.

how to implement walk forward testing in sklearn?

1 Answers

I think you could use a Time Series Split either instead of your own implementation or as a basis for implementing a CV method which is exactly as you describe it.

After digging around a bit, it seems like someone added a max_train_size to the TimeSeriesSplit in this PR which seems like it does what you want.

140

answered Sep 27 '22 21:09

Matthijs Brouns

Related questions
                            
                                How to make this Block of python code short and efficient
                            
                                Running Ruby, Node, Python and Docker on the new Apple Silicon architecture? [closed]
                            
                                aiogevent event loop "fails" to track greenlets
                            
                                cx-freeze, runpy and multiprocessing - multiple paths to failure
                            
                                Occasionally, Django messages are repeated across requests (i.e., they are not cleared)
                            
                                DCGAN debugging. Getting just garbage
                            
                                Run Python Debugger (pdb) in Sublime Text 3
                            
                                Pytorch vs. Keras: Pytorch model overfits heavily
                            
                                Pretty print json but keep inner arrays on one line python
                            
                                Can packages be shared across Anaconda environments?
                            
                                Simple Python implementation of collaborative topic modeling?
                            
                                Unit Testing: Assert that a file/path exists
                            
                                Using python and matplotlib on android
                            
                                Pass --no-deps in PIP requirements.txt
                            
                                benchmarks: does python have a faster way of walking a network folder?
                            
                                What is the actual impact of calling socket.recv with a bufsize that is not a power of 2?
                            
                                Why is the same SQLite query being 30 times slower when fetching only twice as many results?
                            
                                Can I get the python call stack with the linux perf?
                            
                                Remainder function (%) runtime on numpy arrays is far longer than manual remainder calculation
                            
                                Class-level read-only properties in Python

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

how to implement walk forward testing in sklearn?

Tags:

python

time-series

scikit-learn

cross-validation

PhilChang

People also ask

1 Answers

Matthijs Brouns

Recent Activity

Donate For Us