(Preface: I'm neither a statistician nor a programmer. I work in the humanities, so have mercy on my soul).
I need to calculate the Euclidean distance between a series of points in R. I've been using dist(), as follows:
> x <- c(0,0)
> y <- c(0,10)
> dist(rbind(x,y))
x
y 10
So far, so good. But when I was looking at my results (with real numbers), they were horribly off. So much so that I figured my R script was grabbing data from the wrong columns. But I checked, and it isn't.
So I started playing around with toy numbers, and I was in for a surprise. The above example (a vertical line) works correctly, as does the following (a horizontal line):
> x <- c(0,10)
> y <- c(0,0)
> dist(rbind(x,y))
x
y 10
But when the line the two points form is diagonal, strangeness ensues:
> x <- c(0,10)
> y <- c(0,10)
> dist(rbind(x,y))
x
y 0
A distance of 0? Huh? That can't be right.
And when the points are identical (that's quite possible in my data), we go down the rabbit hole:
> x <- c(0,0)
> y <- c(10,10)
> dist(rbind(x,y))
x
y 14.14214
Should this not be 0? The points are identical, after all, so there can be no distance between them.
Just in case there's something wrong with dist(), I tried to implement the formula manually, going by Wikipedia. Same results:
> sqrt(sum((x - y) ^ 2))
[1] 14.14214
As I said above, my math background is minimal, so I fully expect that the error here is mine. If so, please explain what it is and how to correct it. But from where I stand right now, it seems like something is very wrong.
And worst of all, I can't analyze my data.
Euclidean distance is the shortest possible distance between two points. Formula to calculate this distance is : Euclidean distance = √Σ(xi-yi)^2 where, x and y are the input values. The distance between 2 arrays can also be calculated in R, the array function takes a vector and array dimension as inputs.
The normalized squared euclidean distance gives the squared distance between two vectors where there lengths have been scaled to have unit norm. This is helpful when the direction of the vector is meaningful but the magnitude is not.
The Euclidean distance is always greater than or equal to zero. The measurement would be zero for identical points and high for points that show little similarity. The figure below shows an example of two points called a and b.
It looks like you want dist(cbind(x, y))
, not dist(rbind(x, y))
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With