scikit-learn cross validation custom splits for time series data

Tags:

I'd like to use scikit-learn's GridSearchCV to determine some hyper parameters for a random forest model. My data is time dependent and looks something like

import pandas as pd

train = pd.DataFrame({'date': pd.DatetimeIndex(['2012-1-1', '2012-9-30', '2013-4-3', '2014-8-16', '2015-3-20', '2015-6-30']), 
'feature1': [1.2, 3.3, 2.7, 4.0, 8.2, 6.5],
'feature2': [4, 4, 10, 3, 10, 9],
'target': [1,2,1,3,2,2]})

>>> train
        date  feature1  feature2  target
0 2012-01-01       1.2         4       1
1 2012-09-30       3.3         4       2
2 2013-04-03       2.7        10       1
3 2014-08-16       4.0         3       3
4 2015-03-20       8.2        10       2
5 2015-06-30       6.5         9       2

How can I implement the following cross validation folding technique?

train:(2012, 2013) - test:(2014)
train:(2013, 2014) - test:(2015)

That is, I want to use 2 years of historic observations to train a model and then test its accuracy in the subsequent year.

577

asked Jun 02 '16 05:06

Ben

1 Answers

There is also the TimeSeriesSplit function in sklearn, which splits time-series data (i.e. with fixed time intervals), in train/test sets. Note that unlike standard cross-validation methods, successive training sets are supersets of those that come before them, i.e. in each split, test indices must be higher than before, and thus shuffling in cross validator is inappropriate.

145

answered Sep 26 '22 06:09

mloning

Related questions
                            
                                Styling long chains in Python
                            
                                Arguments to cv2::imshow
                            
                                Applying map for partial argument
                            
                                Why does a python module act like a singleton?
                            
                                SQLAlchemy and UnicodeDecodeError
                            
                                Python list.remove() skips next element in list
                            
                                Does the `shell` in `shell=True` in subprocess means `bash`?
                            
                                Django -- Conditional Login Redirect
                            
                                Increase all of a lists values by an increment [duplicate]
                            
                                permanently remove directory from python path
                            
                                Error using cv2.equalizeHist
                            
                                Search for a value in a nested dictionary python
                            
                                How to make a list from a raw_input in python? [duplicate]
                            
                                How do I remove verbs, prepositions, conjunctions etc from my text? [closed]
                            
                                Sqlite - Use backticks (`) or double quotes (") with python
                            
                                Python argparse value range help message appearance
                            
                                How to create categorical variable based on a numerical variable
                            
                                What is the easiest way to detect key presses in python 3 on a linux machine?
                            
                                Mongoengine is very slow on large documents compared to native pymongo usage
                            
                                Running a python script in virtual environment with node.js pm2

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

scikit-learn cross validation custom splits for time series data

Tags:

python

machine-learning

scikit-learn

Ben

People also ask

1 Answers

mloning

Recent Activity

Donate For Us