Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to fix some coefficient in sklearn.linear_model.LogisticRegression

I am running logistic regression sklearn.linear_model.LogisticRegression. Say I have 3 features in my model A, B and C. I want to fix the coefficient of A but want sklearn to estimate the coefficient of B and C to minimize the logloss. I know I can do this in R pretty easily using offset(), but not sure how to do this in sklearn?

Context: I am doing a causal analysis where I have estimated the coefficient of A using a separate instrumental variable approach. Now I want to also estimate coefficient and B and C for predictive purpose.

like image 612
Hao Hu Avatar asked Sep 16 '25 12:09

Hao Hu


1 Answers

I couldn't find a way to do this with sklearn directly, but you can accomplish this in Python with a GLM from statsmodels. You can find the documentation here. The interface is a little different than an sklearn-style estimator, but you can build a thin wrapper. A simple example is something like:

from sklearn.base import BaseEstimator, ClassifierMixin
from statsmodels.api import GLM, families

class LogisticRegressionWithOffset(BaseEstimator, ClassifierMixin):
    def fit(self, X, y, offset=None):
        self.offset = offset
        self.fitted = GLM(y, X, family=families.Binomial(), offset=offset).fit()
        return self

    def predict_proba(self, X):
        p = self.fitted.predict(X, offset=self.offset).reshape(-1, 1)
        return np.concatenate([1 - p, p], axis=1)

    def predict(self, X):
        return 1*(self.predict_proba(X)[:,1]>0.5)
like image 94
Foster Avatar answered Sep 19 '25 01:09

Foster