I have a classic linear regression problem of the form:
y = X b
where y
is a response vector X
is a matrix of input variables and b
is the vector of fit parameters I am searching for.
Python provides b = numpy.linalg.lstsq( X , y )
for solving problems of this form.
However, when I use this I tend to get either extremely large or extremely small values for the components of b
.
I'd like to perform the same fit, but constrain the values of b
between 0 and 255.
It looks like scipy.optimize.fmin_slsqp()
is an option, but I found it extremely slow for the size of problem I'm interested in (X
is something like 3375 by 1500
and hopefully even larger).
b
coefficient values?Your constraint implies that you are regressing y on a single variable x1+x2 and forcing its coefficient to be 1. That doesn't solve the problem of errors in predictors. Errors in the dependent variable are what you expect with regression. โ Nick Cox.
Linear Regression Theory Linear regression performs the task to predict a dependent variable value (y) based on a given independent variable (x). So, this regression technique finds out a linear relationship between x (input) and y(output). Hence, the name is Linear Regression.
When implementing linear regression of some dependent variable ๐ฆ on the set of independent variables ๐ฑ = (๐ฅโ, โฆ, ๐ฅแตฃ), where ๐ is the number of predictors, you assume a linear relationship between ๐ฆ and ๐ฑ: ๐ฆ = ๐ฝโ + ๐ฝโ๐ฅโ + โฏ + ๐ฝแตฃ๐ฅแตฃ + ๐. This equation is the regression equation.
Lasso is a modification of linear regression, where the model is penalized for the sum of absolute values of the weights. Thus, the absolute values of weight will be (in general) reduced, and many will tend to be zeros.
Recent scipy versions include a solver:
https://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.lsq_linear.html#scipy.optimize.lsq_linear
You mention you would find Lasso Regression or Ridge Regression acceptable. These and many other constrained linear models are available in the scikit-learn package. Check out the section on generalized linear models.
Usually constraining the coefficients involves some kind of regularization parameter (C or alpha)---some of the models (the ones ending in CV) can use cross validation to automatically set these parameters. You can also further constrain models to use only positive coefficents---for example, there is an option for this on the Lasso model.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With