I have a data set that defines a set of points on a 2-dimensional Cartesian plane. Theoretically, those points should form a line, but that line may be perfectly horizontal, perfectly vertical, and anything in between.
I would like to design an algorithm that rates the 'straightness' of that line.
For example, the following data sets would be perfectly straight:
Y = 2/3x + 4
X | Y
---------
-3 | 2
0 | 4
3 | 6
Y = 4
X | Y
---------
1 | 4
2 | 4
3 | 4
X = -1
X | Y
---------
-1 | 7
-1 | 8
-1 | 9
While this one would not:
X | Y
---------
-3 | 2
0 | 5
3 | 6
I think it would work to minimize the sum of the squares of the distances of each point from to a line (usually called a regression line), then determine the average distance of each point to the line. Thus, a perfectly straight line would have an average distance of 0.
Because the data can represent a line that is vertical, as I understand it, the usual least-squares regression line won't work for this data set. A perpendicular least-squares regression line might work, but I've had little luck finding an implementation of one.
I am working in Excel 2010 VBA, but I should be able to translate any reasonable algorithm.
Thanks, PaulH
The reason things like RSQ and LinEst won't work for this is because I need a universal measurement that includes vertical lines. As a line's slope approaches infinity (vertical), their RSQ approaches 0 even if the line is perfectly straight or nearly so.
-PaulH
Sounds like you are looking for R2, the coefficient of determinism.
Basically, you take the residual sum of squares, divide by the sum of squares and subtract from 1.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With