Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Algorithm for scatter plot 'best-fit' line

Tags:

c#

mschart

I'm writing a small application in C# using MSChart control to do Scatter Plots of sets of X and Y data points. Some of these can be rather large (hundreds of data points).

Wanted to ask if there's a 'standard' algorith for plotting a best-fit line across the points. I'm thinking to divide the X data points to a predefined number of sets, say 10 or 20, and for each set take the average of the corresponding Y values and the middle X value, and so on to create the line. Is this a correct approach?

I've searched existing threads but they all seem to be about achieving the same using existing applications like Matlab.

Thanks,

like image 665
veezi Avatar asked Oct 18 '12 02:10

veezi


2 Answers

using a Linear least squares algorithm

public class XYPoint
{
    public int X;
    public double Y;
}

class Program
{
    public static List<XYPoint> GenerateLinearBestFit(List<XYPoint> points, out double a, out double b)
    {
        int numPoints = points.Count;
        double meanX = points.Average(point => point.X);
        double meanY = points.Average(point => point.Y);

        double sumXSquared = points.Sum(point => point.X * point.X);
        double sumXY = points.Sum(point => point.X * point.Y);

        a = (sumXY / numPoints - meanX * meanY) / (sumXSquared / numPoints - meanX * meanX);
        b = (a * meanX - meanY);

        double a1 = a;
        double b1 = b;

        return points.Select(point => new XYPoint() { X = point.X, Y = a1 * point.X - b1 }).ToList();
    }

    static void Main(string[] args)
    {
        List<XYPoint> points = new List<XYPoint>()
                                   {
                                       new XYPoint() {X = 1, Y = 12},
                                       new XYPoint() {X = 2, Y = 16},
                                       new XYPoint() {X = 3, Y = 34},
                                       new XYPoint() {X = 4, Y = 45},
                                       new XYPoint() {X = 5, Y = 47}
                                   };

        double a, b;

        List<XYPoint> bestFit = GenerateLinearBestFit(points, out a, out b);

        Console.WriteLine("y = {0:#.####}x {1:+#.####;-#.####}", a, -b);

        for(int index = 0; index < points.Count; index++)
        {
            Console.WriteLine("X = {0}, Y = {1}, Fit = {2:#.###}", points[index].X, points[index].Y, bestFit[index].Y);
        }
    }
}
like image 100
Robert Slaney Avatar answered Nov 02 '22 01:11

Robert Slaney


Yes. You will want to use Linear Regression, specifically Simple Linear Regression.

The algorithm is essentially:

  • assume there exists a line of best fit, y = ax + b
  • for each of your points, you want to minimise their distance from this line
  • calculate the distance for each point from the line, and sum the distances (normally we use the square of the distance to more heavily penalise points further from the line)
  • find the values of a and b that minimise the resulting equation using basic calculus (there should be only one minimum)

The wikipedia page will give you everything you need.

like image 36
Kirk Broadhurst Avatar answered Nov 02 '22 03:11

Kirk Broadhurst