Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Difference between LinearRegression() and Ridge(alpha=0)

The Tikhonov (ridge) cost becomes equivalent to the least squares cost when the alpha parameter approaches zero. Everything on the scikit-learn docs about the subject indicates the same. Therefore I expected

sklearn.linear_model.Ridge(alpha=1e-100).fit(data, target)

to be equivalent to

sklearn.linear_model.LinearRegression().fit(data, target)

But that's not the case. Why?

Updated with code:

import pandas as pd
from sklearn.linear_model import Ridge, LinearRegression
from sklearn.preprocessing import PolynomialFeatures
import matplotlib.pyplot as plt
%matplotlib inline

dataset = pd.read_csv('house_price_data.csv')

X = dataset['sqft_living'].reshape(-1, 1)
Y = dataset['price'].reshape(-1, 1)

polyX = PolynomialFeatures(degree=15).fit_transform(X)

model1 = LinearRegression().fit(polyX, Y)
model2 = Ridge(alpha=1e-100).fit(polyX, Y)

plt.plot(X, Y,'.',
         X, model1.predict(polyX),'g-',
         X, model2.predict(polyX),'r-')

Note: the plot looks the same for alpha=1e-8 or alpha=1e-100

enter image description here

like image 652
Daniel Salvadori Avatar asked Nov 13 '16 03:11

Daniel Salvadori


People also ask

What is the difference between Linearregression and Ridge?

Linear Regression establishes a relationship between dependent variable (Y) and one or more independent variables (X) using a best fit straight line (also known as regression line). Ridge Regression is a technique used when the data suffers from multicollinearity ( independent variables are highly correlated).

What is the difference between Lasso and Ridge?

Similar to the lasso regression, ridge regression puts a similar constraint on the coefficients by introducing a penalty factor. However, while lasso regression takes the magnitude of the coefficients, ridge regression takes the square. Ridge regression is also referred to as L2 Regularization.

What is the difference between ordinary least squares and ridge regression?

Ridge regression is a term used to refer to a linear regression model whose coefficients are not estimated by ordinary least squares (OLS), but by an estimator, called ridge estimator, that is biased but has lower variance than the OLS estimator.

What are lasso and ridge regression?

Brief Overview. Ridge and Lasso regression are powerful techniques generally used for creating parsimonious models in presence of a 'large' number of features. Here 'large' can typically mean either of two things: Large enough to enhance the tendency of a model to overfit (as low as 10 variables might cause overfitting ...


1 Answers

According to the documentation, alpha must be a positive float. Your example has alpha=0 as an integer. Using a small positive alpha, the results of Ridge and LinearRegression appear to converge.

from sklearn.linear_model import Ridge, LinearRegression
data = [[0, 0], [1, 1], [2, 2]]
target = [0, 1, 2]

ridge_model = Ridge(alpha=1e-8).fit(data, target)
print("RIDGE COEFS: " + str(ridge_model.coef_))
ols = LinearRegression().fit(data,target)
print("OLS COEFS: " + str(ols.coef_))

# RIDGE COEFS: [ 0.49999999  0.50000001]
# OLS COEFS: [ 0.5  0.5]
#
# VS. with alpha=0:
# RIDGE COEFS: [  1.57009246e-16   1.00000000e+00]
# OLS COEFS: [ 0.5  0.5]

UPDATE The issue with an alpha=0 as int above seems to only be an issue with a few toy problems like the example above.

For the housing data, the issue is one of scaling. The 15-degree polynomial you invoke is causing numerical overflow. To produce identical results from LinearRegression and Ridge, try scaling your data first:

import pandas as pd
from sklearn.linear_model import Ridge, LinearRegression
from sklearn.preprocessing import PolynomialFeatures, scale

dataset = pd.read_csv('house_price_data.csv')

# scale the X data to prevent numerical errors.
X = scale(dataset['sqft_living'].reshape(-1, 1))
Y = dataset['price'].reshape(-1, 1)

polyX = PolynomialFeatures(degree=15).fit_transform(X)

model1 = LinearRegression().fit(polyX, Y)
model2 = Ridge(alpha=0).fit(polyX, Y)

print("OLS Coefs: " + str(model1.coef_[0]))
print("Ridge Coefs: " + str(model2.coef_[0]))

#OLS Coefs: [  0.00000000e+00   2.69625315e+04   3.20058010e+04  -8.23455994e+04
#  -7.67529485e+04   1.27831360e+05   9.61619464e+04  -8.47728622e+04
#  -5.67810971e+04   2.94638384e+04   1.60272961e+04  -5.71555266e+03
#  -2.10880344e+03   5.92090729e+02   1.03986456e+02  -2.55313741e+01]
#Ridge Coefs: [  0.00000000e+00   2.69625315e+04   3.20058010e+04  -8.23455994e+04
#  -7.67529485e+04   1.27831360e+05   9.61619464e+04  -8.47728622e+04
#  -5.67810971e+04   2.94638384e+04   1.60272961e+04  -5.71555266e+03
#  -2.10880344e+03   5.92090729e+02   1.03986456e+02  -2.55313741e+01]
like image 118
Ryan Walker Avatar answered Sep 29 '22 06:09

Ryan Walker