Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to visualize a nonlinear relationship in a scatter plot

I want to visually explore the relationship between two variables. The functional form of the relationship is not visible in dense scatter plots like this:

scatter plot

How can I add a lowess smooth to the scatter plot in Python?

Or do you have any other suggestions to visually explore non-linear relationships?

I tried the following but it didn't work properly (drawing on an example from Michiel de Hoon):

import numpy as np
from statsmodels.nonparametric.smoothers_lowess import lowess
x = np.arange(0,10,0.01)
ytrue = np.exp(-x/5.0) + 2*np.sin(x/3.0)

# add random errors with a normal distribution                      
y = ytrue + np.random.normal(size=len(x))
plt.scatter(x,y,color='cyan')

# calculate a smooth curve through the scatter plot
ys = lowess(x, y)
_ = plt.plot(x,ys,'red',linewidth=1)

# draw the true values for comparison
plt.plot(x,ytrue,'green',linewidth=3)

lowess

The lowess smoother (red lines) is strange.

EDIT:

The following matrix also includes lowess smoothers (taken from this question on CV): enter image description here

Does someone have the code for such a graph?

like image 643
tobip Avatar asked May 21 '14 13:05

tobip


People also ask

What does a nonlinear relationship look like on a scatter plot?

Nonlinear Relationship: A nonlinear relationship between variables is a relationship whose scatter plot does not resemble a straight line. It could resemble a curve or not really resemble anything. An increase in one variable does not result in a proportional increase or decrease in the other variable.

What if my scatter plot is not linear?

A scatterplot in which the points do not have a linear trend (either positive or negative) is called a zero correlation or a near-zero correlation (see below).

What is the best way to tell if there is a nonlinear relationship?

to detect nonlinear relationship between dependent and independent variables it is necessary to test for normality primarily the values of dependent variable. If the random variable (dependent variable) has a non-Gaussian distribution, the relationship is nonlinear.


1 Answers

You could also use seaborn:

import numpy as np
import seaborn as sns

x = np.arange(0, 10, 0.01)
ytrue = np.exp(-x / 5) + 2 * np.sin(x / 3)
y = ytrue + np.random.normal(size=len(x))

sns.regplot(x, y, lowess=True)

enter image description here

like image 152
mwaskom Avatar answered Nov 09 '22 20:11

mwaskom