Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to plot multiple linear regressions in the same figure

Given the following:

import numpy as np
import pandas as pd
import seaborn as sns

np.random.seed(365)
x1 = np.random.randn(50)
y1 = np.random.randn(50) * 100
x2 = np.random.randn(50)
y2 = np.random.randn(50) * 100

df1 = pd.DataFrame({'x1':x1, 'y1': y1})
df2 = pd.DataFrame({'x2':x2, 'y2': y2})

sns.lmplot('x1', 'y1', df1, fit_reg=True, ci = None)
sns.lmplot('x2', 'y2', df2, fit_reg=True, ci = None)

This will create 2 separate plots. How can I add the data from df2 onto the SAME graph? All the seaborn examples I have found online seem to focus on how you can create adjacent graphs (say, via the 'hue' and 'col_wrap' options). Also, I prefer not to use the dataset examples where an additional column might be present as this does not have a natural meaning in the project I am working on.

If there is a mixture of matplotlib/seaborn functions that are required to achieve this, I would be grateful if someone could help illustrate.

like image 523
laszlopanaflex Avatar asked Mar 16 '16 03:03

laszlopanaflex


People also ask

Why can’t I plot the results of multiple linear regression?

For example, the following code shows how to fit a simple linear regression model to a dataset and plot the results: However, when we perform multiple linear regression it becomes difficult to visualize the results because there are several predictor variables and we can’t simply plot a regression line on a 2-D plot.

How can I visualize the fitted regression line in R?

When we perform simple linear regression in R, it’s easy to visualize the fitted regression line because we’re only working with a single predictor variable and a single response variable. For example, the following code shows how to fit a simple linear regression model to a dataset and plot the results:

What are the two types of multiple linear regression?

There are two types of multiple linear regression: ordinary least squares (OLS) and generalized least squares (GLS). The main difference between the two is that OLS assumes there is not a strong correlation between any two independent variables.

How to plot multiple graphs on the same plot in R?

One is by using subplot () function and other by superimposition of second graph on the first i.e, all graphs will appear on the same plot. We will look into both the ways one by one.


2 Answers

Option 1: sns.regplot

  • In this case, the easiest to implement solution is to use sns.regplot, which is an axes-level function, because this will not require combining df1 and df2.
import pandas as pd
import seaborn
import matplotlib.pyplot as plt

# create the figure and axes
fig, ax = plt.subplots(figsize=(6, 6))

# add the plots for each dataframe
sns.regplot(x='x1', y='y1', data=df1, fit_reg=True, ci=None, ax=ax, label='df1')
sns.regplot(x='x2', y='y2', data=df2, fit_reg=True, ci=None, ax=ax, label='df2')
ax.set(ylabel='y', xlabel='x')
ax.legend()
plt.show()

enter image description here


Option 2: sns.lmplot

  • As per sns.FacetGrid, it is better to use figure-level functions than to use FacetGrid directly.
  • Combine df1 and df2 into a long format, and then use sns.lmplot with the hue parameter.
  • When working with seaborn, it is almost always necessary for the data to be in a long format.
    • It's customary to use pandas.DataFrame.stack or pandas.melt to convert DataFrames from wide to long.
    • For this reason, df1 and df2 must have the columns renamed, and have an additional identifying column. This allows them to be concatenated on axis=0 (the default long format), instead of axis=1 (a wide format).
  • There are a number of ways to combine the DataFrames:
    1. The combination method in the answer from Primer is fine if combining a few DataFrames.
    2. However, a function, as shown below, is better for combining many DataFrames.
def fix_df(data: pd.DataFrame, name: str) -> pd.DataFrame:
    """rename columns and add a column"""
    # rename columns to a common name
    data.columns = ['x', 'y']
    # add an identifying value to use with hue
    data['df'] = name
    return data


# create a list of the dataframes
df_list = [df1, df2]

# update the dataframes by calling the function in a list comprehension
df_update_list = [fix_df(v, f'df{i}') for i, v in enumerate(df_list, 1)]

# combine the dataframes
df = pd.concat(df_update_list).reset_index(drop=True)

# plot the dataframe
sns.lmplot(data=df, x='x', y='y', hue='df', ci=None)

enter image description here

Notes

  • Package versions used for this answer:
    • pandas v1.2.4
    • seaborn v0.11.1
    • matplotlib v3.3.4
like image 101
Trenton McKinney Avatar answered Oct 27 '22 06:10

Trenton McKinney


You could use seaborn's FacetGrid class to get desired result. You would need to replace your plotting calls with these lines:

# sns.lmplot('x1', 'y1', df1, fit_reg=True, ci = None)
# sns.lmplot('x2', 'y2', df2, fit_reg=True, ci = None)
df = pd.concat([df1.rename(columns={'x1':'x','y1':'y'})
                .join(pd.Series(['df1']*len(df1), name='df')), 
                df2.rename(columns={'x2':'x','y2':'y'})
                .join(pd.Series(['df2']*len(df2), name='df'))],
               ignore_index=True)

pal = dict(df1="red", df2="blue")
g = sns.FacetGrid(df, hue='df', palette=pal, size=5);
g.map(plt.scatter, "x", "y", s=50, alpha=.7, linewidth=.5, edgecolor="white")
g.map(sns.regplot, "x", "y", ci=None, robust=1)
g.add_legend();

This will yield this plot:

enter image description here

Which is if I understand correctly is what you need.

Note that you will need to pay attention to .regplot parameters and may want to change the values I have put as an example.

  • ; at the end of the line is to suppress output of the command (I use ipython notebook where it's visible).
  • Docs give some explanation on the .map() method. In essence, it does just that, maps plotting command with data. However it will work with 'low-level' plotting commands like regplot, and not lmlplot, which is actually calling regplot behind the scene.
  • Normally plt.scatter would take parameters: c='none', edgecolor='r' to make non-filled markers. But seaborn is interfering the process and enforcing color to the markers, so I don't see an easy/straigtforward way to fix this, but to manipulate ax elements after seaborn has produced the plot, which is best to be addressed as part of a different question.
like image 33
Primer Avatar answered Oct 27 '22 08:10

Primer