Using statsmodel estimations with scikit-learn cross validation, is it possible?

Tags:

I posted this question to Cross Validated forum and later realized may be this would find appropriate audience in stackoverlfow instead.

I am looking for a way I can use the fit object (result) ontained from python statsmodel to feed into cross_val_score of scikit-learn cross_validation method? The attached link suggests that it may be possible but I have not succeeded.

I am getting the following error

estimator should a be an estimator implementing 'fit' method statsmodels.discrete.discrete_model.BinaryResultsWrapper object at 0x7fa6e801c590 was passed

Refer this link

763

asked Dec 08 '16 17:12

CARTman

1 Answers

Indeed, you cannot use cross_val_score directly on statsmodels objects, because of different interface: in statsmodels

training data is passed directly into the constructor
a separate object contains the result of model estimation

However, you can write a simple wrapper to make statsmodels objects look like sklearn estimators:

import statsmodels.api as sm from sklearn.base import BaseEstimator, RegressorMixin  class SMWrapper(BaseEstimator, RegressorMixin):     """ A universal sklearn-style wrapper for statsmodels regressors """     def __init__(self, model_class, fit_intercept=True):         self.model_class = model_class         self.fit_intercept = fit_intercept     def fit(self, X, y):         if self.fit_intercept:             X = sm.add_constant(X)         self.model_ = self.model_class(y, X)         self.results_ = self.model_.fit()         return self     def predict(self, X):         if self.fit_intercept:             X = sm.add_constant(X)         return self.results_.predict(X)

This class contains correct fit and predict methods, and can be used with sklearn, e.g. cross-validated or included into a pipeline. Like here:

from sklearn.datasets import make_regression from sklearn.model_selection import cross_val_score from sklearn.linear_model import LinearRegression  X, y = make_regression(random_state=1, n_samples=300, noise=100)  print(cross_val_score(SMWrapper(sm.OLS), X, y, scoring='r2')) print(cross_val_score(LinearRegression(), X, y, scoring='r2'))

You can see that the output of two models is identical, because they are both OLS models, cross-validated in the same way.

[0.28592315 0.37367557 0.47972639] [0.28592315 0.37367557 0.47972639]

133

answered Sep 20 '22 02:09

David Dale

Related questions
                            
                                Creating a new function as return in python function?
                            
                                Compiling Python 3.4 is not copying pip
                            
                                Shipping Python modules in pyspark to other nodes
                            
                                Python for-loop without index and item
                            
                                How to map a function using multiple columns in pandas?
                            
                                Python nested context manager on multiple lines [duplicate]
                            
                                Python and Windows Named Pipes
                            
                                Truncating unicode so it fits a maximum size when encoded for wire transfer
                            
                                Multivariate spline interpolation in python/scipy?
                            
                                What is the equivalence in Python 3 of letters in Python 2?
                            
                                How do I see the Python doc on Linux?
                            
                                Setting SQLAlchemy autoincrement start value
                            
                                How to exclude mock package from python coverage report using nosetests
                            
                                Topic distribution: How do we see which document belong to which topic after doing LDA in python
                            
                                How to make nosetests use python3
                            
                                Matplotlib automatic legend outside plot [duplicate]
                            
                                Export Pandas DataFrame into a PDF file using Python
                            
                                Passing a tuple as command line argument
                            
                                Find out if/which BLAS library is used by Numpy
                            
                                Show training and validation accuracy in TensorFlow using same graph

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Using statsmodel estimations with scikit-learn cross validation, is it possible?

Tags:

python

scikit-learn

statsmodels

cross-validation

CARTman

People also ask

1 Answers

David Dale

Recent Activity

Donate For Us