Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to get a regression summary in scikit-learn like R does?

As an R user, I wanted to also get up to speed on scikit.

Creating a linear regression model(s) is fine, but can't seem to find a reasonable way to get a standard summary of regression output.

Code example:

# Linear Regression import numpy as np from sklearn import datasets from sklearn.linear_model import LinearRegression  # Load the diabetes datasets dataset = datasets.load_diabetes()  # Fit a linear regression model to the data model = LinearRegression() model.fit(dataset.data, dataset.target) print(model)  # Make predictions expected = dataset.target predicted = model.predict(dataset.data)  # Summarize the fit of the model mse = np.mean((predicted-expected)**2) print model.intercept_, model.coef_, mse,  print(model.score(dataset.data, dataset.target)) 

Issues:

  • seems like the intercept and coef are built into the model, and I just type print (second to last line) to see them.
  • What about all the other standard regression output like R^2, adjusted R^2, p values, etc. If I read the examples correctly, seems like you have to write a function/equation for each of these and then print it.
  • So, is there no standard summary output for lin. reg. models?
  • Also, in my printed array of outputs of coefficients, there are no variable names associated with each of these? I just get the numeric array. Is there a way to print these where I get an output of the coefficients and the variable they go with?

My printed output:

LinearRegression(copy_X=True, fit_intercept=True, normalize=False) 152.133484163 [ -10.01219782 -239.81908937  519.83978679  324.39042769 -792.18416163   476.74583782  101.04457032  177.06417623  751.27932109   67.62538639] 2859.69039877 0.517749425413 

Notes: Started off with Linear, Ridge and Lasso. I have gone through the examples. Below is for the basic OLS.

like image 275
mpg Avatar asked Oct 11 '14 21:10

mpg


People also ask

What does LinearRegression () fit () do?

Linear Regression Theory Linear regression performs the task to predict a dependent variable value (y) based on a given independent variable (x). So, this regression technique finds out a linear relationship between x (input) and y(output). Hence, the name is Linear Regression.

What is R in regression model summary?

R, the multiple correlation coefficient, is the linear correlation between the observed and model-predicted values of the dependent variable. Its large value indicates a strong relationship. R Square, the coefficient of determination, is the squared value of the multiple correlation coefficient.


1 Answers

There exists no R type regression summary report in sklearn. The main reason is that sklearn is used for predictive modelling / machine learning and the evaluation criteria are based on performance on previously unseen data (such as predictive r^2 for regression).

There does exist a summary function for classification called sklearn.metrics.classification_report which calculates several types of (predictive) scores on a classification model.

For a more classic statistical approach, take a look at statsmodels.

like image 148
eickenberg Avatar answered Sep 20 '22 23:09

eickenberg