Code for best fit straight line of a scatter plot in python

Tags:

Below is my code for scatter plotting the data in my text file. The file I am opening contains two columns. The left column is x coordinates and the right column is y coordinates. the code creates a scatter plot of x vs. y. I need a code to overplot a line of best fit to the data in the scatter plot, and none of the built in pylab function have worked for me.

from matplotlib import *
from pylab import *

with open('file.txt') as f:
   data = [line.split() for line in f.readlines()]
   out = [(float(x), float(y)) for x, y in data]
for i in out:
   scatter(i[0],i[1])
   xlabel('X')
   ylabel('Y')
   title('My Title')
show()

938

asked Mar 07 '14 01:03

Astronerd326

3 Answers

A one-line version of this excellent answer to plot the line of best fit is:

plt.plot(np.unique(x), np.poly1d(np.polyfit(x, y, 1))(np.unique(x)))

Using np.unique(x) instead of x handles the case where x isn't sorted or has duplicate values.

answered Oct 22 '22 06:10

1''

Assuming line of best fit for a set of points is:

y = a + b * x

where:

b = ( sum(xi * yi) - n * xbar * ybar ) / sum((xi - xbar)^2)
a = ybar - b * xbar

Code and plot

# sample points 
X = [0, 5, 10, 15, 20]
Y = [0, 7, 10, 13, 20]

# solve for a and b
def best_fit(X, Y):

    xbar = sum(X)/len(X)
    ybar = sum(Y)/len(Y)
    n = len(X) # or len(Y)

    numer = sum([xi*yi for xi,yi in zip(X, Y)]) - n * xbar * ybar
    denum = sum([xi**2 for xi in X]) - n * xbar**2

    b = numer / denum
    a = ybar - b * xbar

    print('best fit line:\ny = {:.2f} + {:.2f}x'.format(a, b))

    return a, b

# solution
a, b = best_fit(X, Y)
#best fit line:
#y = 0.80 + 0.92x

# plot points and fit line
import matplotlib.pyplot as plt
plt.scatter(X, Y)
yfit = [a + b * xi for xi in X]
plt.plot(X, yfit)

enter image description here

UPDATE:

notebook version

answered Oct 22 '22 06:10

Aziz Alto

You can use numpy's polyfit. I use the following (you can safely remove the bit about coefficient of determination and error bounds, I just think it looks nice):

#!/usr/bin/python3

import numpy as np
import matplotlib.pyplot as plt
import csv

with open("example.csv", "r") as f:
    data = [row for row in csv.reader(f)]
    xd = [float(row[0]) for row in data]
    yd = [float(row[1]) for row in data]

# sort the data
reorder = sorted(range(len(xd)), key = lambda ii: xd[ii])
xd = [xd[ii] for ii in reorder]
yd = [yd[ii] for ii in reorder]

# make the scatter plot
plt.scatter(xd, yd, s=30, alpha=0.15, marker='o')

# determine best fit line
par = np.polyfit(xd, yd, 1, full=True)

slope=par[0][0]
intercept=par[0][1]
xl = [min(xd), max(xd)]
yl = [slope*xx + intercept  for xx in xl]

# coefficient of determination, plot text
variance = np.var(yd)
residuals = np.var([(slope*xx + intercept - yy)  for xx,yy in zip(xd,yd)])
Rsqr = np.round(1-residuals/variance, decimals=2)
plt.text(.9*max(xd)+.1*min(xd),.9*max(yd)+.1*min(yd),'$R^2 = %0.2f$'% Rsqr, fontsize=30)

plt.xlabel("X Description")
plt.ylabel("Y Description")

# error bounds
yerr = [abs(slope*xx + intercept - yy)  for xx,yy in zip(xd,yd)]
par = np.polyfit(xd, yerr, 2, full=True)

yerrUpper = [(xx*slope+intercept)+(par[0][0]*xx**2 + par[0][1]*xx + par[0][2]) for xx,yy in zip(xd,yd)]
yerrLower = [(xx*slope+intercept)-(par[0][0]*xx**2 + par[0][1]*xx + par[0][2]) for xx,yy in zip(xd,yd)]

plt.plot(xl, yl, '-r')
plt.plot(xd, yerrLower, '--r')
plt.plot(xd, yerrUpper, '--r')
plt.show()

answered Oct 22 '22 04:10

Micah

Related questions
                            
                                How can I create stacked line graph with matplotlib?
                            
                                Most Pythonic way to concatenate strings
                            
                                How to fetch a non-ascii url with urlopen?
                            
                                subprocess: deleting child processes in Windows
                            
                                Built-in module to calculate the least common multiple
                            
                                Python: How to ignore #comment lines when reading in a file
                            
                                NumPy version of "Exponential weighted moving average", equivalent to pandas.ewm().mean()
                            
                                run a python script in terminal without the python command
                            
                                Python MySQLdb TypeError: not all arguments converted during string formatting
                            
                                Get the string within brackets in Python
                            
                                python requests.get() returns improperly decoded text instead of UTF-8?
                            
                                Remove C and C++ comments using Python?
                            
                                Find the index of the first digit in a string
                            
                                make distutils in Python automatically find packages
                            
                                SQLAlchemy query where a column contains a substring
                            
                                Python psycopg2 not inserting into postgresql table
                            
                                Remove whitespace in Python using string.whitespace
                            
                                How do I increase the contrast of an image in Python OpenCV
                            
                                Concise vector adding in Python? [duplicate]
                            
                                Setting GOOGLE_APPLICATION_CREDENTIALS for BigQuery Python CLI

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With