As an R user, I wanted to also get up to speed on scikit. Creating a linear regression model(s) is fine, but can't seem to find a reasonable way to get a standard summary of regression output. Code example: <pre class="prettyprint"><code># Linear Regression import numpy as np from sklearn import datasets from sklearn.linear_model import LinearRegression # Load the diabetes datasets dataset = datasets.load_diabetes() # Fit a linear regression model to the data model = LinearRegression() model.fit(dataset.data, dataset.target) print(model) # Make predictions expected = dataset.target predicted = model.predict(dataset.data) # Summarize the fit of the model mse = np.mean((predicted-expected)**2) print model.intercept_, model.coef_, mse, print(model.score(dataset.data, dataset.target)) </code></pre> Issues: <ul> <li>seems like the intercept and coef are built into the model, and I just type <code>print</code> (second to last line) to see them.</li> <li>What about all the other standard regression output like R^2, adjusted R^2, p values, etc. If I read the examples correctly, seems like you have to write a function/equation for each of these and then print it. </li> <li>So, is there no standard summary output for lin. reg. models?</li> <li>Also, in my printed array of outputs of coefficients, there are no variable names associated with each of these? I just get the numeric array. Is there a way to print these where I get an output of the coefficients and the variable they go with? </li> </ul> My printed output: <pre class="prettyprint"><code>LinearRegression(copy_X=True, fit_intercept=True, normalize=False) 152.133484163 [ -10.01219782 -239.81908937 519.83978679 324.39042769 -792.18416163 476.74583782 101.04457032 177.06417623 751.27932109 67.62538639] 2859.69039877 0.517749425413 </code></pre> Notes: Started off with Linear, Ridge and Lasso. I have gone through the examples. Below is for the basic OLS.

There exists no R type regression summary report in sklearn. The main reason is that sklearn is used for predictive modelling / machine learning and the evaluation criteria are based on performance on previously unseen data (such as predictive r^2 for regression). There does exist a summary function for classification called <code>sklearn.metrics.classification_report</code> which calculates several types of (predictive) scores on a classification model. For a more classic statistical approach, take a look at <code>statsmodels</code>.

How to get a regression summary in scikit-learn like R does?

Tags:

python

r

scikit-learn

linear-regression

summary

As an R user, I wanted to also get up to speed on scikit.

Creating a linear regression model(s) is fine, but can't seem to find a reasonable way to get a standard summary of regression output.

Code example:

# Linear Regression import numpy as np from sklearn import datasets from sklearn.linear_model import LinearRegression  # Load the diabetes datasets dataset = datasets.load_diabetes()  # Fit a linear regression model to the data model = LinearRegression() model.fit(dataset.data, dataset.target) print(model)  # Make predictions expected = dataset.target predicted = model.predict(dataset.data)  # Summarize the fit of the model mse = np.mean((predicted-expected)**2) print model.intercept_, model.coef_, mse,  print(model.score(dataset.data, dataset.target))

Issues:

seems like the intercept and coef are built into the model, and I just type print (second to last line) to see them.
What about all the other standard regression output like R^2, adjusted R^2, p values, etc. If I read the examples correctly, seems like you have to write a function/equation for each of these and then print it.
So, is there no standard summary output for lin. reg. models?
Also, in my printed array of outputs of coefficients, there are no variable names associated with each of these? I just get the numeric array. Is there a way to print these where I get an output of the coefficients and the variable they go with?

My printed output:

LinearRegression(copy_X=True, fit_intercept=True, normalize=False) 152.133484163 [ -10.01219782 -239.81908937  519.83978679  324.39042769 -792.18416163   476.74583782  101.04457032  177.06417623  751.27932109   67.62538639] 2859.69039877 0.517749425413

Notes: Started off with Linear, Ridge and Lasso. I have gone through the examples. Below is for the basic OLS.

275

asked Oct 11 '14 21:10

mpg

1 Answers

There exists no R type regression summary report in sklearn. The main reason is that sklearn is used for predictive modelling / machine learning and the evaluation criteria are based on performance on previously unseen data (such as predictive r^2 for regression).

There does exist a summary function for classification called sklearn.metrics.classification_report which calculates several types of (predictive) scores on a classification model.

For a more classic statistical approach, take a look at statsmodels.

148

answered Sep 20 '22 23:09

eickenberg

Related questions
                            
                                Slicing a vector in C++
                            
                                Simple implementation of N-Gram, tf-idf and Cosine similarity in Python
                            
                                Dynamic terminal printing with python
                            
                                Writing to MySQL database with pandas using SQLAlchemy, to_sql
                            
                                Python packages - import by class, not file
                            
                                Python : Trying to POST form using requests
                            
                                Web scraping - how to identify main content on a webpage
                            
                                Python dictionary creation syntax
                            
                                Pandas: Creating DataFrame from Series
                            
                                Conda environments and .BAT files
                            
                                Writing a connection string when password contains special characters
                            
                                Passing list of parameters to SQL in psycopg2
                            
                                How to obtain values of parameters of get request in flask?
                            
                                How to plot multiple Seaborn Jointplot in Subplot
                            
                                How to install pip3 on Windows?
                            
                                Python AND operator on two boolean lists - how?
                            
                                In Python pandas, start row index from 1 instead of zero without creating additional column
                            
                                How to create a reference to a variable in python?
                            
                                Python error "import: unable to open X server"
                            
                                Share axes in matplotlib for only part of the subplots

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With