If I use the seaborn library in Python to plot the result of a linear regression, is there a way to find out the numerical results of the regression? For example, I might want to know the fitting coefficients or the R2 of the fit. I could re-run the same fit using the underlying statsmodels interface, but that would seem to be unnecessary duplicate effort, and anyway I'd want to be able to compare the resulting coefficients to be sure the numerical results are the same as what I'm seeing in the plot.

There's no way to do this. In my opinion, asking a visualization library to give you statistical modeling results is backwards. <code>statsmodels</code>, a modeling library, lets you fit a model and then draw a plot that corresponds exactly to the model you fit. If you want that exact correspondence, this order of operations makes more sense to me. You might say "but the plots in <code>statsmodels</code> don't have as many aesthetic options as <code>seaborn</code>". But I think that makes sense — <code>statsmodels</code> is a modeling library that sometimes uses visualization in the service of modeling. <code>seaborn</code> is a visualization library that sometimes uses modeling in the service of visualization. It is good to specialize, and bad to try to do everything. Fortunately, both <code>seaborn</code> and <code>statsmodels</code> use tidy data. That means that you really need very little effort duplication to get both plots and models through the appropriate tools.

Seaborn's creator has unfortunately stated that he won't add such a feature. Below are some options. (The last section contains my original suggestion, which was a hack that used private implementation details of <code>seaborn</code> and was not particularly flexible.) <h3>Simple alternative version of <code>regplot</code> </h3> The following function overlays a fit line on a scatter plot and returns the results from <code>statsmodels</code>. This supports the simplest and perhaps most common usage for <code>sns.regplot</code>, but does not implement any of the fancier functionality. <pre class="prettyprint"><code>import statsmodels.api as sm def simple_regplot( x, y, n_std=2, n_pts=100, ax=None, scatter_kws=None, line_kws=None, ci_kws=None ): """ Draw a regression line with error interval. """ ax = plt.gca() if ax is None else ax # calculate best-fit line and interval x_fit = sm.add_constant(x) fit_results = sm.OLS(y, x_fit).fit() eval_x = sm.add_constant(np.linspace(np.min(x), np.max(x), n_pts)) pred = fit_results.get_prediction(eval_x) # draw the fit line and error interval ci_kws = {} if ci_kws is None else ci_kws ax.fill_between( eval_x[:, 1], pred.predicted_mean - n_std * pred.se_mean, pred.predicted_mean + n_std * pred.se_mean, alpha=0.5, **ci_kws, ) line_kws = {} if line_kws is None else line_kws h = ax.plot(eval_x[:, 1], pred.predicted_mean, **line_kws) # draw the scatterplot scatter_kws = {} if scatter_kws is None else scatter_kws ax.scatter(x, y, c=h[0].get_color(), **scatter_kws) return fit_results </code></pre> The results from <code>statsmodels</code> contain a wealth of information, e.g.: <pre class="prettyprint"><code>>>> print(fit_results.summary()) OLS Regression Results ============================================================================== Dep. Variable: y R-squared: 0.477 Model: OLS Adj. R-squared: 0.471 Method: Least Squares F-statistic: 89.23 Date: Fri, 08 Jan 2021 Prob (F-statistic): 1.93e-15 Time: 17:56:00 Log-Likelihood: -137.94 No. Observations: 100 AIC: 279.9 Df Residuals: 98 BIC: 285.1 Df Model: 1 Covariance Type: nonrobust ============================================================================== coef std err t P>|t| [0.025 0.975] ------------------------------------------------------------------------------ const -0.1417 0.193 -0.735 0.464 -0.524 0.241 x1 3.1456 0.333 9.446 0.000 2.485 3.806 ============================================================================== Omnibus: 2.200 Durbin-Watson: 1.777 Prob(Omnibus): 0.333 Jarque-Bera (JB): 1.518 Skew: -0.002 Prob(JB): 0.468 Kurtosis: 2.396 Cond. No. 4.35 ============================================================================== Notes: [1] Standard Errors assume that the covariance matrix of the errors is correctly specified. </code></pre> <h3>A drop-in replacement (almost) for <code>sns.regplot</code> </h3> The advantage of the method above over my original answer below is that it's easy to extend it to more complex fits. Shameless plug: here is such an extended <code>regplot</code> function that I wrote that implements a large fraction of <code>sns.regplot</code>'s functionality: https://github.com/ttesileanu/pydove. While some features are still missing, the function I wrote <ul> <li>allows flexibility by separating the plotting from the statistical modeling (and you also get easy access to the fitting results).</li> <li>is much faster for large datasets because it lets <code>statsmodels</code> calculate confidence intervals instead of using bootstrapping.</li> <li>allows for slightly more diverse fits (e.g., polynomials in <code>log(x)</code>).</li> <li>allows for slightly more fine-grained plotting options.</li> </ul> <h3>Old answer</h3> Seaborn's creator has unfortunately stated that he won't add such a feature, so here's a workaround. <pre class="prettyprint"><code>def regplot( *args, line_kws=None, marker=None, scatter_kws=None, **kwargs ): # this is the class that `sns.regplot` uses plotter = sns.regression._RegressionPlotter(*args, **kwargs) # this is essentially the code from `sns.regplot` ax = kwargs.get("ax", None) if ax is None: ax = plt.gca() scatter_kws = {} if scatter_kws is None else copy.copy(scatter_kws) scatter_kws["marker"] = marker line_kws = {} if line_kws is None else copy.copy(line_kws) plotter.plot(ax, scatter_kws, line_kws) # unfortunately the regression results aren't stored, so we rerun grid, yhat, err_bands = plotter.fit_regression(plt.gca()) # also unfortunately, this doesn't return the parameters, so we infer them slope = (yhat[-1] - yhat[0]) / (grid[-1] - grid[0]) intercept = yhat[0] - slope * grid[0] return slope, intercept </code></pre> Note that this only works for linear regression because it simply infers the slope and intercept from the regression results. The nice thing is that it uses <code>seaborn</code>'s own regression class and so the results are guaranteed to be consistent with what's shown. The downside is of course that we're using a private implementation detail in <code>seaborn</code> that can break at any point.

How to get the numerical fitting results when plotting a regression in seaborn?

Tags:

python

seaborn

If I use the seaborn library in Python to plot the result of a linear regression, is there a way to find out the numerical results of the regression? For example, I might want to know the fitting coefficients or the R² of the fit.

I could re-run the same fit using the underlying statsmodels interface, but that would seem to be unnecessary duplicate effort, and anyway I'd want to be able to compare the resulting coefficients to be sure the numerical results are the same as what I'm seeing in the plot.

945

asked Apr 04 '14 02:04

The Photon

2 Answers

There's no way to do this.

In my opinion, asking a visualization library to give you statistical modeling results is backwards. statsmodels, a modeling library, lets you fit a model and then draw a plot that corresponds exactly to the model you fit. If you want that exact correspondence, this order of operations makes more sense to me.

You might say "but the plots in statsmodels don't have as many aesthetic options as seaborn". But I think that makes sense — statsmodels is a modeling library that sometimes uses visualization in the service of modeling. seaborn is a visualization library that sometimes uses modeling in the service of visualization. It is good to specialize, and bad to try to do everything.

Fortunately, both seaborn and statsmodels use tidy data. That means that you really need very little effort duplication to get both plots and models through the appropriate tools.

176

answered Sep 23 '22 19:09

mwaskom

Seaborn's creator has unfortunately stated that he won't add such a feature. Below are some options. (The last section contains my original suggestion, which was a hack that used private implementation details of seaborn and was not particularly flexible.)

Simple alternative version of `regplot`

The following function overlays a fit line on a scatter plot and returns the results from statsmodels. This supports the simplest and perhaps most common usage for sns.regplot, but does not implement any of the fancier functionality.

import statsmodels.api as sm   def simple_regplot(     x, y, n_std=2, n_pts=100, ax=None, scatter_kws=None, line_kws=None, ci_kws=None ):     """ Draw a regression line with error interval. """     ax = plt.gca() if ax is None else ax      # calculate best-fit line and interval     x_fit = sm.add_constant(x)     fit_results = sm.OLS(y, x_fit).fit()      eval_x = sm.add_constant(np.linspace(np.min(x), np.max(x), n_pts))     pred = fit_results.get_prediction(eval_x)      # draw the fit line and error interval     ci_kws = {} if ci_kws is None else ci_kws     ax.fill_between(         eval_x[:, 1],         pred.predicted_mean - n_std * pred.se_mean,         pred.predicted_mean + n_std * pred.se_mean,         alpha=0.5,         **ci_kws,     )     line_kws = {} if line_kws is None else line_kws     h = ax.plot(eval_x[:, 1], pred.predicted_mean, **line_kws)      # draw the scatterplot     scatter_kws = {} if scatter_kws is None else scatter_kws     ax.scatter(x, y, c=h[0].get_color(), **scatter_kws)      return fit_results

The results from statsmodels contain a wealth of information, e.g.:

>>> print(fit_results.summary())                              OLS Regression Results                             ============================================================================== Dep. Variable:                      y   R-squared:                       0.477 Model:                            OLS   Adj. R-squared:                  0.471 Method:                 Least Squares   F-statistic:                     89.23 Date:                Fri, 08 Jan 2021   Prob (F-statistic):           1.93e-15 Time:                        17:56:00   Log-Likelihood:                -137.94 No. Observations:                 100   AIC:                             279.9 Df Residuals:                      98   BIC:                             285.1 Df Model:                           1                                          Covariance Type:            nonrobust                                          ==============================================================================                  coef    std err          t      P>|t|      [0.025      0.975] ------------------------------------------------------------------------------ const         -0.1417      0.193     -0.735      0.464      -0.524       0.241 x1             3.1456      0.333      9.446      0.000       2.485       3.806 ============================================================================== Omnibus:                        2.200   Durbin-Watson:                   1.777 Prob(Omnibus):                  0.333   Jarque-Bera (JB):                1.518 Skew:                          -0.002   Prob(JB):                        0.468 Kurtosis:                       2.396   Cond. No.                         4.35 ==============================================================================  Notes: [1] Standard Errors assume that the covariance matrix of the errors is correctly specified.

A drop-in replacement (almost) for `sns.regplot`

The advantage of the method above over my original answer below is that it's easy to extend it to more complex fits.

Shameless plug: here is such an extended regplot function that I wrote that implements a large fraction of sns.regplot's functionality: https://github.com/ttesileanu/pydove.

While some features are still missing, the function I wrote

allows flexibility by separating the plotting from the statistical modeling (and you also get easy access to the fitting results).
is much faster for large datasets because it lets statsmodels calculate confidence intervals instead of using bootstrapping.
allows for slightly more diverse fits (e.g., polynomials in log(x)).
allows for slightly more fine-grained plotting options.

Old answer

Seaborn's creator has unfortunately stated that he won't add such a feature, so here's a workaround.

def regplot(     *args,     line_kws=None,     marker=None,     scatter_kws=None,     **kwargs ):     # this is the class that `sns.regplot` uses     plotter = sns.regression._RegressionPlotter(*args, **kwargs)      # this is essentially the code from `sns.regplot`     ax = kwargs.get("ax", None)     if ax is None:         ax = plt.gca()      scatter_kws = {} if scatter_kws is None else copy.copy(scatter_kws)     scatter_kws["marker"] = marker     line_kws = {} if line_kws is None else copy.copy(line_kws)      plotter.plot(ax, scatter_kws, line_kws)      # unfortunately the regression results aren't stored, so we rerun     grid, yhat, err_bands = plotter.fit_regression(plt.gca())      # also unfortunately, this doesn't return the parameters, so we infer them     slope = (yhat[-1] - yhat[0]) / (grid[-1] - grid[0])     intercept = yhat[0] - slope * grid[0]     return slope, intercept

Note that this only works for linear regression because it simply infers the slope and intercept from the regression results. The nice thing is that it uses seaborn's own regression class and so the results are guaranteed to be consistent with what's shown. The downside is of course that we're using a private implementation detail in seaborn that can break at any point.

answered Sep 25 '22 19:09

Legendre17

Related questions
                            
                                Reading Unicode file data with BOM chars in Python
                            
                                Save MinMaxScaler model in sklearn
                            
                                How to tell if a date is between two other dates?
                            
                                Error installing python-snappy: snappy-c.h: No such file or directory
                            
                                Javascript - No 'Access-Control-Allow-Origin' header is present on the requested resource
                            
                                Split an integer into digits to compute an ISBN checksum
                            
                                Finding moving average from data points in Python
                            
                                ssl.SSLError: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:749)
                            
                                Python post osx notification
                            
                                Conversion from IP string to integer, and backward in Python
                            
                                Python Binomial Coefficient
                            
                                What is the true difference between a dictionary and a hash table?
                            
                                Replace \n with <br />
                            
                                Blender: Walk around sphere
                            
                                IPython Notebook Multiple Checkpoints
                            
                                django migrations - workflow with multiple dev branches
                            
                                What is the proper way to work with shared modules in Python development?
                            
                                How do you kill Futures once they have started?
                            
                                Pandas in AWS lambda gives numpy error
                            
                                How to write a download progress indicator in Python?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to get the numerical fitting results when plotting a regression in seaborn?

Tags:

python

seaborn

The Photon

People also ask

2 Answers

mwaskom

Simple alternative version of `regplot`

A drop-in replacement (almost) for `sns.regplot`

Old answer

Legendre17

Recent Activity

Donate For Us

How to get the numerical fitting results when plotting a regression in seaborn?

Tags:

python

seaborn

The Photon

People also ask

2 Answers

mwaskom

Simple alternative version of regplot

A drop-in replacement (almost) for sns.regplot

Old answer

Legendre17

Related questions

Recent Activity

Donate For Us

Simple alternative version of `regplot`

A drop-in replacement (almost) for `sns.regplot`