Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

In the LinearRegression method in sklearn, what exactly is the fit_intercept parameter doing? [closed]

In the sklearn.linear_model.LinearRegression method, there is a parameter that is fit_intercept = TRUE or fit_intercept = FALSE. I am wondering if we set it to TRUE, does it add an additional intercept column of all 1's to your dataset? If I already have a dataset with a column of 1's, does fit_intercept = FALSE account for that or does it force it to fit a zero intercept model?

Update: It seems people do not get my question. The question is, what IF I had already a column of 1's in my dataset of predictors (the 1's are for the intercept). THEN,

  1. if I use fit_intercept = FALSE, will it remove the column of 1's?

  2. if I use fit_intercept = TRUE, will it add an EXTRA column of 1's?

like image 356
user321627 Avatar asked Oct 16 '17 21:10

user321627


People also ask

What is Fit_intercept in logistic regression?

fit_intercept=False sets the y-intercept to 0. If fit_intercept=True , the y-intercept will be determined by the line of best fit.

What does Linear_model LinearRegression () do?

LinearRegression fits a linear model with coefficients w = (w1, …, wp) to minimize the residual sum of squares between the observed targets in the dataset, and the targets predicted by the linear approximation. Whether to calculate the intercept for this model.

What is LinearRegression score?

Linear Regression Scoring: This type of scoring is performed by implementing linear regression algorithm on the random sample of data. The process includes scoring techniques on variables that have linear dependencies.

How does Sklearn linear regression work?

Prerequisite: Linear Regression Linear Regression is a machine learning algorithm based on supervised learning. It performs a regression task. Regression models a target prediction value based on independent variables. It is mostly used for finding out the relationship between variables and forecasting.


1 Answers

fit_intercept=False sets the y-intercept to 0. If fit_intercept=True, the y-intercept will be determined by the line of best fit.

from sklearn.linear_model import LinearRegression from sklearn.datasets import make_regression import numpy as np import matplotlib.pyplot as plt  bias = 100  X = np.arange(1000).reshape(-1,1) y_true = np.ravel(X.dot(0.3) + bias) noise = np.random.normal(0, 60, 1000) y = y_true + noise  lr_fi_true = LinearRegression(fit_intercept=True) lr_fi_false = LinearRegression(fit_intercept=False)  lr_fi_true.fit(X, y) lr_fi_false.fit(X, y)  print('Intercept when fit_intercept=True : {:.5f}'.format(lr_fi_true.intercept_)) print('Intercept when fit_intercept=False : {:.5f}'.format(lr_fi_false.intercept_))  lr_fi_true_yhat = np.dot(X, lr_fi_true.coef_) + lr_fi_true.intercept_ lr_fi_false_yhat = np.dot(X, lr_fi_false.coef_) + lr_fi_false.intercept_  plt.scatter(X, y, label='Actual points') plt.plot(X, lr_fi_true_yhat, 'r--', label='fit_intercept=True') plt.plot(X, lr_fi_false_yhat, 'r-', label='fit_intercept=False') plt.legend()  plt.vlines(0, 0, y.max()) plt.hlines(bias, X.min(), X.max()) plt.hlines(0, X.min(), X.max())  plt.show() 

This example prints:

Intercept when fit_intercept=True : 100.32210 Intercept when fit_intercept=False : 0.00000 

Visually it becomes clear what fit_intercept does. When fit_intercept=True, the line of best fit is allowed to "fit" the y-axis (close to 100 in this example). When fit_intercept=False, the intercept is forced to the origin (0, 0).

fit_intercept in sklearn


What happens if I include a column of ones or zeros and set fit_intercept to True or False?

Below shows an example of how to inspect this.

from sklearn.linear_model import LinearRegression from sklearn.datasets import make_regression import numpy as np import matplotlib.pyplot as plt  np.random.seed(1) bias = 100  X = np.arange(1000).reshape(-1,1) y_true = np.ravel(X.dot(0.3) + bias) noise = np.random.normal(0, 60, 1000) y = y_true + noise  # with column of ones X_with_ones = np.hstack((np.ones((X.shape[0], 1)), X))  for b,data in ((True, X), (False, X), (True, X_with_ones), (False, X_with_ones)):   lr = LinearRegression(fit_intercept=b)   lr.fit(data, y)    print(lr.intercept_, lr.coef_) 

Take-away:

# fit_intercept=True, no column of zeros or ones 104.156765787 [ 0.29634031] # fit_intercept=False, no column of zeros or ones 0.0 [ 0.45265361] # fit_intercept=True, column of zeros or ones 104.156765787 [ 0.          0.29634031] # fit_intercept=False, column of zeros or ones 0.0 [ 104.15676579    0.29634031] 
like image 121
Jarad Avatar answered Sep 22 '22 12:09

Jarad