Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Lasso - Choose the initial point in scikit coordinate descent

My question is quite general on Lasso in scikit:

I am doing a regression with Lasso to fit a certain number of points y_i to features x_i. The number of points n is strictly inferior to the dimension p of the features.

Hence there exist several solutions for a given penalty alpha coefficient.

The solution given by scikit depends on the starting point (it's a vector of d zero-coefficients).

Apart from modifying the library, would you know of another library that provides the freedom to select the starting point?

Or maybe there's an obvious option I missed in scikit to choose the starting point?

like image 945
Colonel Beauvel Avatar asked Oct 21 '22 00:10

Colonel Beauvel


1 Answers

It is possible to set up the initial point for Lasso in scikit-learn.

But there may be an infinite set of equally good solution, to discover which you need some advanced quadratic programming methods

To set the initial point, you just initialize the model with warm_start=True and set its coef_ attribute.

Like this:

from sklearn.linear_model import Lasso
model = Lasso(warm_start=True)
model.coef_ = manual_initial_coef
model.fit(X, y)

It is possible, because the code inside scikit-learn Lasso implementation contains

if not self.warm_start or not hasattr(self, "coef_"):
        coef_ = np.zeros((n_targets, n_features), dtype=X.dtype,
                         order='F')
    else:
        coef_ = self.coef_
        if coef_.ndim == 1:
            coef_ = coef_[np.newaxis, :]

In my opinion, however, the default initial coefficient (zeros) is the best for most problems. Indeed, when you apply lasso you usually expect that most its final coefficients would be zero - why not start from all zeros?

In case of degenerate design matrix, Lasso solution is indeed non-unique. But there cannot be multiple disjoint local optima (like in neural networks), because cost function is still (non-strictly) convex. Instead, there may be a continuous (and also convex) set of equally good solutions. A simplest case of such ambiguity is when x consists of two identical columns: coefficients (beta, 0), (0, beta)$, and all their convex combinations do equaly well.

If it is the case, simple restarting from multiple random points will not give you the whole set of solutions. Instead, you need to either use special techniques to somehow define its corner (extreme) points, or somehow define the "best" solution among this set. One way of defining the unique "best" solution is LARS algorithm (sklearn.linear_model.Lars), which gives "equal rights" to all covariates in the indeterminate cases.

like image 166
David Dale Avatar answered Oct 23 '22 21:10

David Dale