I've searched the sklearn docs for TimeSeriesSplit
and the docs for cross-validation but I haven't been able to find a working example.
I'm using sklearn version 0.19.
This is my setup
import xgboost as xgb from sklearn.model_selection import TimeSeriesSplit from sklearn.grid_search import GridSearchCV import numpy as np X = np.array([[4, 5, 6, 1, 0, 2], [3.1, 3.5, 1.0, 2.1, 8.3, 1.1]]).T y = np.array([1, 6, 7, 1, 2, 3]) tscv = TimeSeriesSplit(n_splits=2) for train, test in tscv.split(X): print(train, test)
gives:
[0 1] [2 3] [0 1 2 3] [4 5]
If I try:
model = xgb.XGBRegressor() param_search = {'max_depth' : [3, 5]} my_cv = TimeSeriesSplit(n_splits=2).split(X) gsearch = GridSearchCV(estimator=model, cv=my_cv, param_grid=param_search) gsearch.fit(X, y)
it gives: TypeError: object of type 'generator' has no len()
I get the problem: GridSearchCV
is trying to call len(cv)
but my_cv
is an iterator without length. However, the docs for GridSearchCV
state I can use a
int, cross-validation generator or an iterable, optional
I tried using TimeSeriesSplit
without the .split(X)
but it still didn't work.
I'm sure I'm overlooking something simple, thanks!!
GridSearchCV tries all the combinations of the values passed in the dictionary and evaluates the model for each combination using the Cross-Validation method. Hence after using this function we get accuracy/loss for every combination of hyperparameters and we can choose the one with the best performance.
cv: number of cross-validation you have to try for each selected set of hyperparameters. verbose: you can set it to 1 to get the detailed print out while you fit the data to GridSearchCV.
It turns out the problem was I was using GridSearchCV
from sklearn.grid_search
, which is deprecated. Importing GridSearchCV
from sklearn.model_selection
resolved the problem:
import xgboost as xgb from sklearn.model_selection import TimeSeriesSplit, GridSearchCV import numpy as np X = np.array([[4, 5, 6, 1, 0, 2], [3.1, 3.5, 1.0, 2.1, 8.3, 1.1]]).T y = np.array([1, 6, 7, 1, 2, 3]) model = xgb.XGBRegressor() param_search = {'max_depth' : [3, 5]} tscv = TimeSeriesSplit(n_splits=2) gsearch = GridSearchCV(estimator=model, cv=tscv, param_grid=param_search) gsearch.fit(X, y)
gives:
GridSearchCV(cv=<generator object TimeSeriesSplit.split at 0x11ab4abf8>, error_score='raise', estimator=XGBRegressor(base_score=0.5, colsample_bylevel=1, colsample_bytree=1, gamma=0, learning_rate=0.1, max_delta_step=0, max_depth=3, min_child_weight=1, missing=None, n_estimators=100, nthread=-1, objective='reg:linear', reg_alpha=0, reg_lambda=1, scale_pos_weight=1, seed=0, silent=True, subsample=1), fit_params=None, iid=True, n_jobs=1, param_grid={'max_depth': [3, 5]}, pre_dispatch='2*n_jobs', refit=True, return_train_score=True, scoring=None, verbose=0)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With