Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to add a line of best fit to scatter plot

I'm currently working with Pandas and matplotlib to perform some data visualization and I want to add a line of best fit to my scatter plot.

Here is my code:

import matplotlib
import matplotlib.pyplot as plt
import pandas as panda
import numpy as np

def PCA_scatter(filename):

   matplotlib.style.use('ggplot')

   data = panda.read_csv(filename)
   data_reduced = data[['2005', '2015']]

   data_reduced.plot(kind='scatter', x='2005', y='2015')
   plt.show()

PCA_scatter('file.csv')

How do I go about this?

like image 464
JavascriptLoser Avatar asked May 15 '16 03:05

JavascriptLoser


People also ask

Does a scatter plot need a line of best fit?

A line of best fit is a straight line that minimizes the distance between it and some data. The line of best fit is used to express a relationship in a scatter plot of different data points. It is an output of regression analysis and can be used as a prediction tool for indicators and price movements.

How do you add a line of best fit on a scatter plot on SPSS?

First, to add the line of fit described by the regression analysis, right click on the chart and select “Add fit Line at Total”, which is towards the bottom of the listed options. This will add the line to the scatter plot. You can close the “Properties” box that is displayed after you have added this line.


1 Answers

You can do the whole fit and plot in one fell swoop with Seaborn.

import pandas as pd
import seaborn as sns
data_reduced= pd.read_csv('fake.txt',sep='\s+')
sns.regplot(data_reduced['2005'],data_reduced['2015'])

regressionplot

like image 128
Robert Calhoun Avatar answered Sep 28 '22 06:09

Robert Calhoun