I am running logistic regression sklearn.linear_model.LogisticRegression. Say I have 3 features in my model A, B and C. I want to fix the coefficient of A but want sklearn to estimate the coefficient of B and C to minimize the logloss. I know I can do this in R pretty easily using offset(), but not sure how to do this in sklearn?
Context: I am doing a causal analysis where I have estimated the coefficient of A using a separate instrumental variable approach. Now I want to also estimate coefficient and B and C for predictive purpose.
I couldn't find a way to do this with sklearn
directly, but you can accomplish this in Python with a GLM from statsmodels
. You can find the documentation here. The interface is a little different than an sklearn
-style estimator, but you can build a thin wrapper. A simple example is something like:
from sklearn.base import BaseEstimator, ClassifierMixin
from statsmodels.api import GLM, families
class LogisticRegressionWithOffset(BaseEstimator, ClassifierMixin):
def fit(self, X, y, offset=None):
self.offset = offset
self.fitted = GLM(y, X, family=families.Binomial(), offset=offset).fit()
return self
def predict_proba(self, X):
p = self.fitted.predict(X, offset=self.offset).reshape(-1, 1)
return np.concatenate([1 - p, p], axis=1)
def predict(self, X):
return 1*(self.predict_proba(X)[:,1]>0.5)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With