Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Numpy linear regression with regularization

I'm not seeing what is wrong with my code for regularized linear regression. Unregularized I have simply this, which I'm reasonably certain is correct:

import numpy as np

def get_model(features, labels):
    return np.linalg.pinv(features).dot(labels)

Here's my code for a regularized solution, where I'm not seeing what is wrong with it:

def get_model(features, labels, lamb=0.0):
    n_cols = features.shape[1]
    return linalg.inv(features.transpose().dot(features) + lamb * np.identity(n_cols))\
            .dot(features.transpose()).dot(labels)

With the default value of 0.0 for lamb, my intention is that it should give the same result as the (correct) unregularized version, but the difference is actually quite large.

Does anyone see what the problem is?

like image 596
Marshall Farrier Avatar asked Dec 15 '14 03:12

Marshall Farrier


1 Answers

The problem is:

features.transpose().dot(features) may not be invertible. And numpy.linalg.inv works only for full-rank matrix according to the documents. However, a (non-zero) regularization term always makes the equation nonsingular.

By the way, you are right about the implementation. But it is not efficient. An efficient way to solve this equation is the least squares method.

np.linalg.lstsq(features, labels) can do the work for np.linalg.pinv(features).dot(labels).

In a general way, you can do this

def get_model(A, y, lamb=0):
    n_col = A.shape[1]
    return np.linalg.lstsq(A.T.dot(A) + lamb * np.identity(n_col), A.T.dot(y))
like image 61
nullas Avatar answered Nov 13 '22 23:11

nullas