I am currently running multiple linear regression on a dataset. At first, I didn't realize I needed to put constraints over my weights; as a matter of fact, I need to have specific positive & negative weights.
To be more precise, I am doing a scoring system and this is why some of my variables should have a positive or negative impact on the note. Yet, when running my model, the results do not fit what I am expecting, some of my 'positive' variables get negative coefficients and vice versa.
As an example, let's suppose my model is :
y = W0*x0 + W1*x1 + W2*x2
Where x2 is a 'positive' variable, I would like to put a constraint over W2 to be positive !
I have been looking around a lot about this issue but I've not found anything about constraints on specific weights/coefficients, all that I've found is about setting all coefficients positive or summing them to one.
I am working on Python using the ScikitLearn packages. This is how I get my best model :
def ridge(Xtrain, Xtest, Ytrain, Ytest, position):
param_grid={'alpha':[0.01 , 0.1, 1, 10, 50, 100, 1000]}
gs = grid_search.GridSearchCV(Ridge(), param_grid=param_grid, n_jobs=-1, cv=3)
gs.fit(Xtrain, Ytrain)
hatytrain = gs.predict(Xtrain)
hatytest = gs.predict(Xtest)
Any idea of how I could assign a constraint on the coefficient of a specific variable ? Probably going to be burdensome to define each constraint but I have no idea how to do otherwise.
Introducing a covariate to a multiple regression model is very similar to conducting sequential multiple regression (sometimes called hierarchical multiple regression). In each of these situations, blocks are used to enter specific variables (be they predictors or covariates) into the model in chunks.
Linear regression performs the task to predict a dependent variable value (y) based on a given independent variable (x). So, this regression technique finds out a linear relationship between x (input) and y(output). Hence, the name is Linear Regression.
Scikit-learn does not allow such constraints on the coefficients.
But you can impose any constraints on coefficients and optimize the loss with coordinate descent if you implement your own estimator. In the unconstraint case, coordinate descent produces the same result as OLS in reasonable number of iterations.
I've written a class that imposes upper and lower bounds on LinearRegression coefficients. You can extend it to use Ridge or evel Lasso penalty if you want:
from sklearn.linear_model.base import LinearModel
from sklearn.base import RegressorMixin
from sklearn.utils import check_X_y
import numpy as np
class ConstrainedLinearRegression(LinearModel, RegressorMixin):
def __init__(self, fit_intercept=True, normalize=False, copy_X=True, nonnegative=False, tol=1e-15):
self.fit_intercept = fit_intercept
self.normalize = normalize
self.copy_X = copy_X
self.nonnegative = nonnegative
self.tol = tol
def fit(self, X, y, min_coef=None, max_coef=None):
X, y = check_X_y(X, y, accept_sparse=['csr', 'csc', 'coo'], y_numeric=True, multi_output=False)
X, y, X_offset, y_offset, X_scale = self._preprocess_data(
X, y, fit_intercept=self.fit_intercept, normalize=self.normalize, copy=self.copy_X)
self.min_coef_ = min_coef if min_coef is not None else np.repeat(-np.inf, X.shape[1])
self.max_coef_ = max_coef if max_coef is not None else np.repeat(np.inf, X.shape[1])
if self.nonnegative:
self.min_coef_ = np.clip(self.min_coef_, 0, None)
beta = np.zeros(X.shape[1]).astype(float)
prev_beta = beta + 1
hessian = np.dot(X.transpose(), X)
while not (np.abs(prev_beta - beta)<self.tol).all():
prev_beta = beta.copy()
for i in range(len(beta)):
grad = np.dot(np.dot(X,beta) - y, X)
beta[i] = np.minimum(self.max_coef_[i],
np.maximum(self.min_coef_[i],
beta[i]-grad[i] / hessian[i,i]))
self.coef_ = beta
self._set_intercept(X_offset, y_offset, X_scale)
return self
You can use this class, for example, to make all coefficients non-negative
from sklearn.datasets import load_boston
from sklearn.linear_model import LinearRegression
X, y = load_boston(return_X_y=True)
model = ConstrainedLinearRegression(nonnegative=True)
model.fit(X, y)
print(model.intercept_)
print(model.coef_)
This produces an output like
-36.99292986145538
[0. 0.05286515 0. 4.12512386 0. 8.04017956
0. 0. 0. 0. 0. 0.02273805
0. ]
You can see that most coefficients are zero. An ordinary LinearModel would have made them negative:
model = LinearRegression()
model.fit(X, y)
print(model.intercept_)
print(model.coef_)
which would return to you
36.49110328036191
[-1.07170557e-01 4.63952195e-02 2.08602395e-02 2.68856140e+00
-1.77957587e+01 3.80475246e+00 7.51061703e-04 -1.47575880e+00
3.05655038e-01 -1.23293463e-02 -9.53463555e-01 9.39251272e-03
-5.25466633e-01]
You can also impose arbitrary bounds for any coefficients you choose - that's what you asked for. For example, in this setup
model = ConstrainedLinearRegression()
min_coef = np.repeat(-np.inf, X.shape[1])
min_coef[0] = 0
min_coef[4] = -1
max_coef = np.repeat(4, X.shape[1])
max_coef[3] = 2
model.fit(X, y, max_coef=max_coef, min_coef=min_coef)
print(model.intercept_)
print(model.coef_)
you would get an output
24.060175576410515
[ 0. 0.04504673 -0.0354073 2. -1. 4.
-0.01343263 -1.17231216 0.2183103 -0.01375266 -0.7747823 0.01122374
-0.56678676]
Update. This solution can be adapted to work with constraints on linear combinations of the coeffitients (e.g. their sum) - in this case, individual constraints for each coefficient would be recalculated on each step. This Github gist provides an example.
UPDATE
Due to popularity of this question, I have created a package with my implementation of constrained linear regression: https://github.com/avidale/constrained-linear-regression.
You can install it with pip install constrained-linear-regression
. Pull requests are welcome!
In version 0.24.2 of scikit-learn, you can force the algorithm to use positive coefficients by using the parameter positive=True to the LinearRegression, by multiplying the columns for which you want a negative coefficient by -1 you should get what you want.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With