I posted this question to Cross Validated forum and later realized may be this would find appropriate audience in stackoverlfow instead.
I am looking for a way I can use the fit
object (result) ontained from python statsmodel to feed into cross_val_score
of scikit-learn cross_validation method? The attached link suggests that it may be possible but I have not succeeded.
I am getting the following error
estimator should a be an estimator implementing 'fit' method statsmodels.discrete.discrete_model.BinaryResultsWrapper object at 0x7fa6e801c590 was passed
Refer this link
While scikit-learn is slightly faster than statsmodels for 1,000 or less observations, this difference is not significant per the t-test analysis. Sci-kit learn is significantly faster for datasets with more than 1,000 observations.
cross_val_score returns score of test fold where cross_val_predict returns predicted y values for the test fold. For the cross_val_score() , you are using the average of the output, which will be affected by the number of folds because then it may have some folds which may have high error (not fit correctly).
Indeed, you cannot use cross_val_score
directly on statsmodels
objects, because of different interface: in statsmodels
However, you can write a simple wrapper to make statsmodels
objects look like sklearn
estimators:
import statsmodels.api as sm from sklearn.base import BaseEstimator, RegressorMixin class SMWrapper(BaseEstimator, RegressorMixin): """ A universal sklearn-style wrapper for statsmodels regressors """ def __init__(self, model_class, fit_intercept=True): self.model_class = model_class self.fit_intercept = fit_intercept def fit(self, X, y): if self.fit_intercept: X = sm.add_constant(X) self.model_ = self.model_class(y, X) self.results_ = self.model_.fit() return self def predict(self, X): if self.fit_intercept: X = sm.add_constant(X) return self.results_.predict(X)
This class contains correct fit
and predict
methods, and can be used with sklearn
, e.g. cross-validated or included into a pipeline. Like here:
from sklearn.datasets import make_regression from sklearn.model_selection import cross_val_score from sklearn.linear_model import LinearRegression X, y = make_regression(random_state=1, n_samples=300, noise=100) print(cross_val_score(SMWrapper(sm.OLS), X, y, scoring='r2')) print(cross_val_score(LinearRegression(), X, y, scoring='r2'))
You can see that the output of two models is identical, because they are both OLS models, cross-validated in the same way.
[0.28592315 0.37367557 0.47972639] [0.28592315 0.37367557 0.47972639]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With