Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I use a TimeSeriesSplit with a GridSearchCV object to tune a model in scikit-learn?

I've searched the sklearn docs for TimeSeriesSplit and the docs for cross-validation but I haven't been able to find a working example.

I'm using sklearn version 0.19.

This is my setup

import xgboost as xgb from sklearn.model_selection import TimeSeriesSplit from sklearn.grid_search import GridSearchCV import numpy as np X = np.array([[4, 5, 6, 1, 0, 2], [3.1, 3.5, 1.0, 2.1, 8.3, 1.1]]).T y = np.array([1, 6, 7, 1, 2, 3]) tscv = TimeSeriesSplit(n_splits=2) for train, test in tscv.split(X):     print(train, test) 

gives:

[0 1] [2 3] [0 1 2 3] [4 5] 

If I try:

model = xgb.XGBRegressor() param_search = {'max_depth' : [3, 5]}  my_cv = TimeSeriesSplit(n_splits=2).split(X) gsearch = GridSearchCV(estimator=model, cv=my_cv,                         param_grid=param_search) gsearch.fit(X, y) 

it gives: TypeError: object of type 'generator' has no len()

I get the problem: GridSearchCV is trying to call len(cv) but my_cv is an iterator without length. However, the docs for GridSearchCV state I can use a

int, cross-validation generator or an iterable, optional

I tried using TimeSeriesSplit without the .split(X) but it still didn't work.

I'm sure I'm overlooking something simple, thanks!!

like image 483
cd98 Avatar asked Oct 13 '17 14:10

cd98


People also ask

How does Sklearn GridSearchCV work?

GridSearchCV tries all the combinations of the values passed in the dictionary and evaluates the model for each combination using the Cross-Validation method. Hence after using this function we get accuracy/loss for every combination of hyperparameters and we can choose the one with the best performance.

What is CV parameter in GridSearchCV?

cv: number of cross-validation you have to try for each selected set of hyperparameters. verbose: you can set it to 1 to get the detailed print out while you fit the data to GridSearchCV.


1 Answers

It turns out the problem was I was using GridSearchCV from sklearn.grid_search, which is deprecated. Importing GridSearchCV from sklearn.model_selection resolved the problem:

import xgboost as xgb from sklearn.model_selection import TimeSeriesSplit, GridSearchCV import numpy as np X = np.array([[4, 5, 6, 1, 0, 2], [3.1, 3.5, 1.0, 2.1, 8.3, 1.1]]).T y = np.array([1, 6, 7, 1, 2, 3])  model = xgb.XGBRegressor() param_search = {'max_depth' : [3, 5]}  tscv = TimeSeriesSplit(n_splits=2) gsearch = GridSearchCV(estimator=model, cv=tscv,                         param_grid=param_search) gsearch.fit(X, y) 

gives:

GridSearchCV(cv=<generator object TimeSeriesSplit.split at 0x11ab4abf8>,        error_score='raise',        estimator=XGBRegressor(base_score=0.5, colsample_bylevel=1, colsample_bytree=1, gamma=0,        learning_rate=0.1, max_delta_step=0, max_depth=3,        min_child_weight=1, missing=None, n_estimators=100, nthread=-1,        objective='reg:linear', reg_alpha=0, reg_lambda=1,        scale_pos_weight=1, seed=0, silent=True, subsample=1),        fit_params=None, iid=True, n_jobs=1,        param_grid={'max_depth': [3, 5]}, pre_dispatch='2*n_jobs',        refit=True, return_train_score=True, scoring=None, verbose=0) 
like image 123
cd98 Avatar answered Sep 20 '22 12:09

cd98