I'm looking to find the distance between the points and the prediction line. Ideally I would like the results to be displayed in a new column which contains the distance, called 'Distance'.
My Imports:
import os.path
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn import preprocessing
from sklearn.linear_model import LinearRegression
%matplotlib inline
Sample of my data:
idx Exam Results Hours Studied
0 93 8.232795
1 94 7.879095
2 92 6.972698
3 88 6.854017
4 91 6.043066
5 87 5.510013
6 89 5.509297
My code so far:
x = df['Hours Studied'].values[:,np.newaxis]
y = df['Exam Results'].values
model = LinearRegression()
model.fit(x, y)
plt.scatter(x, y,color='r')
plt.plot(x, model.predict(x),color='k')
plt.show()
Any help would be greatly appreciated. Thanks
You simply need to assign the difference between y
and model.predict(x)
to a new column (or take absolute value if you just want the magnitude if the difference):
#df["Distance"] = abs(y - model.predict(x)) # if you only want magnitude
df["Distance"] = y - model.predict(x)
print(df)
# Exam Results Hours Studied Distance
#0 93 8.232795 -0.478739
#1 94 7.879095 1.198511
#2 92 6.972698 0.934043
#3 88 6.854017 -2.838712
#4 91 6.043066 1.714063
#5 87 5.510013 -1.265269
#6 89 5.509297 0.736102
This is because your model predicts a y
(dependent variable) for each independent variable (x
). The x
coordinates are the same, so the difference in y
is the value you want.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With