My question is quite general on Lasso in scikit:
I am doing a regression with Lasso to fit a certain number of points y_i
to features x_i
. The number of points n
is strictly inferior to the dimension p
of the features.
Hence there exist several solutions for a given penalty alpha coefficient.
The solution given by scikit depends on the starting point (it's a vector of d
zero-coefficients).
Apart from modifying the library, would you know of another library that provides the freedom to select the starting point?
Or maybe there's an obvious option I missed in scikit to choose the starting point?
It is possible to set up the initial point for Lasso in scikit-learn.
But there may be an infinite set of equally good solution, to discover which you need some advanced quadratic programming methods
To set the initial point, you just initialize the model with warm_start=True
and set its coef_
attribute.
Like this:
from sklearn.linear_model import Lasso
model = Lasso(warm_start=True)
model.coef_ = manual_initial_coef
model.fit(X, y)
It is possible, because the code inside scikit-learn Lasso implementation contains
if not self.warm_start or not hasattr(self, "coef_"):
coef_ = np.zeros((n_targets, n_features), dtype=X.dtype,
order='F')
else:
coef_ = self.coef_
if coef_.ndim == 1:
coef_ = coef_[np.newaxis, :]
In my opinion, however, the default initial coefficient (zeros) is the best for most problems. Indeed, when you apply lasso you usually expect that most its final coefficients would be zero - why not start from all zeros?
In case of degenerate design matrix, Lasso solution is indeed non-unique. But there cannot be multiple disjoint local optima (like in neural networks), because cost function is still (non-strictly) convex. Instead, there may be a continuous (and also convex) set of equally good solutions. A simplest case of such ambiguity is when x
consists of two identical columns: coefficients (beta, 0)
, (0, beta)
$, and all their convex combinations do equaly well.
If it is the case, simple restarting from multiple random points will not give you the whole set of solutions. Instead, you need to either use special techniques to somehow define its corner (extreme) points, or somehow define the "best" solution among this set. One way of defining the unique "best" solution is LARS algorithm (sklearn.linear_model.Lars
), which gives "equal rights" to all covariates in the indeterminate cases.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With