As an R user, I wanted to also get up to speed on scikit.
Creating a linear regression model(s) is fine, but can't seem to find a reasonable way to get a standard summary of regression output.
Code example:
# Linear Regression import numpy as np from sklearn import datasets from sklearn.linear_model import LinearRegression # Load the diabetes datasets dataset = datasets.load_diabetes() # Fit a linear regression model to the data model = LinearRegression() model.fit(dataset.data, dataset.target) print(model) # Make predictions expected = dataset.target predicted = model.predict(dataset.data) # Summarize the fit of the model mse = np.mean((predicted-expected)**2) print model.intercept_, model.coef_, mse, print(model.score(dataset.data, dataset.target))
Issues:
print
(second to last line) to see them.My printed output:
LinearRegression(copy_X=True, fit_intercept=True, normalize=False) 152.133484163 [ -10.01219782 -239.81908937 519.83978679 324.39042769 -792.18416163 476.74583782 101.04457032 177.06417623 751.27932109 67.62538639] 2859.69039877 0.517749425413
Notes: Started off with Linear, Ridge and Lasso. I have gone through the examples. Below is for the basic OLS.
Linear Regression Theory Linear regression performs the task to predict a dependent variable value (y) based on a given independent variable (x). So, this regression technique finds out a linear relationship between x (input) and y(output). Hence, the name is Linear Regression.
R, the multiple correlation coefficient, is the linear correlation between the observed and model-predicted values of the dependent variable. Its large value indicates a strong relationship. R Square, the coefficient of determination, is the squared value of the multiple correlation coefficient.
There exists no R type regression summary report in sklearn. The main reason is that sklearn is used for predictive modelling / machine learning and the evaluation criteria are based on performance on previously unseen data (such as predictive r^2 for regression).
There does exist a summary function for classification called sklearn.metrics.classification_report
which calculates several types of (predictive) scores on a classification model.
For a more classic statistical approach, take a look at statsmodels
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With