I have a scatter plot built from a DataFrame - it shows a correlation of two variables - Length and Age
import matplotlib.pyplot as plt
df = DataFrame (......)
plt.title ('Fish Length vs Age')
plt.xlabel('Length')
plt.ylabel('Age (days)')
plt.scatter(df['length'],df['age'])
Now i want to add a line with a given slope of 0.88 to this scatter plot. How do i do this?
P.S. All examples i managed to find use points and not slopes to draw the line
UPDATE. I re-read the theory - and it turned out that the fact that the correlation coefficient should be plotted against the data points was made up by me :) Partially because of this image in my head
However i still am confused by the line - plotting capabilities of matplotlib
The correlation coefficient won't give the slope of the regression line, because your data are in different scales. If you would like to plot scatter with regression line, I would recommend to do it in seaborn
with a minimum lines of codes.
To install seaborn
,
pip install seaborn
Code example:
import numpy as np
import pandas as pd
import seaborn as sns
# simulate some artificial data
# =====================================
df = pd.DataFrame(np.random.multivariate_normal([10, 100], [[100, 800], [800, 10000]], size=100), columns=['X', 'Y'])
df
# plot
# ====================================
sns.set_style('ticks')
sns.regplot(df.X, df.Y, ci=None)
sns.despine()
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
# simulate some artificial data
# =====================================
df = pd.DataFrame(np.random.multivariate_normal([10, 100], [[100, 800], [800, 10000]], size=100), columns=['X', 'Y'])
# plot
# ==============================
fig, ax = plt.subplots()
ax.scatter(df.X, df.Y)
# need a slope and c to fix the position of line
slope = 10
c = -100
x_min, x_max = ax.get_xlim()
y_min, y_max = c, c + slope*(x_max-x_min)
ax.plot([x_min, x_max], [y_min, y_max])
ax.set_xlim([x_min, x_max])
Building on @JinxunLi's answer you just want to add in a line connecting two points.
These two points have x and y coordinates so for the two points you'll have four numbers: x_0
, y_0
, x_1
, y_1
.
Let's assume you want the x coordinates of those two points to span the x axis so you're going to set x_0
and x_1
manually:
x_0 = 0
x_1 = 5000
Alternatively you can just take the minimum and maximum values from the axis:
x_min, x_max = ax.get_xlim()
x_0 = x_min
x_1 = x_max
You define the slope of a line as increase in y / increase in x
which would be:
slope = (y_1 - y_0) / (x_1 - x_0)
And this can rearrange to:
(y_1 - y_0) = slope * (x_1 - x_0)
There are an infinite number of parallel lines with this slope so we'll have to set one of the points to start off with. For this example let's assume you want the line to go through the origin (0,0)
x_0 = 0 # We already know this as it was set earlier
y_0 = 0
Now you can rearrange the formula for y_1
as:
y_1 = slope * (x_1 - x_0) + y_0
If you know you want the slope to be 0.88 then you can calculate the y position of the other point:
y_1 = 0.88 * (5000 - 0) + 0
For the data you've provided in the question a line with slope 0.88 will fly off the top of the y axis very quickly (y_1 = 4400
in the example above).
In the example below I've put in a line with slope = 0.03.
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
# simulate some artificial data
# =====================================
df = pd.DataFrame( { 'Age' : np.random.rand(25) * 160 } )
df['Length'] = df['Age'] * 0.88 + np.random.rand(25) * 5000
# plot those data points
# ==============================
fig, ax = plt.subplots()
ax.scatter(df['Length'], df['Age'])
# Now add on a line with a fixed slope of 0.03
slope = 0.03
# A line with a fixed slope can intercept the axis
# anywhere so we're going to have it go through 0,0
x_0 = 0
y_0 = 0
# And we'll have the line stop at x = 5000
x_1 = 5000
y_1 = slope (x_1 - x_0) + y_0
# Draw these two points with big triangles to make it clear
# where they lie
ax.scatter([x_0, x_1], [y_0, y_1], marker='^', s=150, c='r')
# And now connect them
ax.plot([x_0, x_1], [y_0, y_1], c='r')
plt.show()
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With