Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Adding a line to a matplotlib scatterplot based on a slope

I have a scatter plot built from a DataFrame - it shows a correlation of two variables - Length and Age

import matplotlib.pyplot as plt
df = DataFrame (......)
plt.title ('Fish Length vs Age')
plt.xlabel('Length')
plt.ylabel('Age (days)')
plt.scatter(df['length'],df['age'])

enter image description here

Now i want to add a line with a given slope of 0.88 to this scatter plot. How do i do this?

P.S. All examples i managed to find use points and not slopes to draw the line

UPDATE. I re-read the theory - and it turned out that the fact that the correlation coefficient should be plotted against the data points was made up by me :) Partially because of this image in my head enter image description here

However i still am confused by the line - plotting capabilities of matplotlib

like image 683
Denys Avatar asked Dec 11 '22 22:12

Denys


2 Answers

The correlation coefficient won't give the slope of the regression line, because your data are in different scales. If you would like to plot scatter with regression line, I would recommend to do it in seaborn with a minimum lines of codes.

To install seaborn,

pip install seaborn

Code example:

import numpy as np
import pandas as pd
import seaborn as sns

# simulate some artificial data
# =====================================
df = pd.DataFrame(np.random.multivariate_normal([10, 100], [[100, 800], [800, 10000]], size=100), columns=['X', 'Y'])

df

# plot 
# ====================================
sns.set_style('ticks')
sns.regplot(df.X, df.Y, ci=None)
sns.despine()  

enter image description here

Edit:

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

# simulate some artificial data
# =====================================
df = pd.DataFrame(np.random.multivariate_normal([10, 100], [[100, 800], [800, 10000]], size=100), columns=['X', 'Y'])


# plot
# ==============================
fig, ax = plt.subplots()
ax.scatter(df.X, df.Y)

# need a slope and c to fix the position of line
slope = 10
c = -100

x_min, x_max = ax.get_xlim()
y_min, y_max = c, c + slope*(x_max-x_min)
ax.plot([x_min, x_max], [y_min, y_max])
ax.set_xlim([x_min, x_max])

enter image description here

like image 112
Jianxun Li Avatar answered Mar 08 '23 23:03

Jianxun Li


Building on @JinxunLi's answer you just want to add in a line connecting two points.

These two points have x and y coordinates so for the two points you'll have four numbers: x_0, y_0, x_1, y_1.

Let's assume you want the x coordinates of those two points to span the x axis so you're going to set x_0 and x_1 manually:

x_0 = 0
x_1 = 5000

Alternatively you can just take the minimum and maximum values from the axis:

x_min, x_max = ax.get_xlim()
x_0 = x_min
x_1 = x_max

You define the slope of a line as increase in y / increase in x which would be:

slope = (y_1 - y_0) / (x_1 - x_0)

And this can rearrange to:

(y_1 - y_0) = slope * (x_1 - x_0)

There are an infinite number of parallel lines with this slope so we'll have to set one of the points to start off with. For this example let's assume you want the line to go through the origin (0,0)

x_0 = 0 # We already know this as it was set earlier
y_0 = 0

Now you can rearrange the formula for y_1 as:

y_1 = slope * (x_1 - x_0) + y_0

If you know you want the slope to be 0.88 then you can calculate the y position of the other point:

y_1 = 0.88 * (5000 - 0) + 0

For the data you've provided in the question a line with slope 0.88 will fly off the top of the y axis very quickly (y_1 = 4400 in the example above).

In the example below I've put in a line with slope = 0.03.

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

# simulate some artificial data
# =====================================
df = pd.DataFrame( { 'Age' : np.random.rand(25) * 160 } )

df['Length'] = df['Age'] * 0.88 + np.random.rand(25) * 5000

# plot those data points
# ==============================
fig, ax = plt.subplots()
ax.scatter(df['Length'], df['Age'])

# Now add on a line with a fixed slope of 0.03
slope = 0.03

# A line with a fixed slope can intercept the axis
# anywhere so we're going to have it go through 0,0
x_0 = 0
y_0 = 0

# And we'll have the line stop at x = 5000
x_1 = 5000
y_1 = slope (x_1 - x_0) + y_0

# Draw these two points with big triangles to make it clear
# where they lie
ax.scatter([x_0, x_1], [y_0, y_1], marker='^', s=150, c='r')

# And now connect them
ax.plot([x_0, x_1], [y_0, y_1], c='r')    

plt.show()

enter image description here

like image 20
KirstieJane Avatar answered Mar 09 '23 00:03

KirstieJane