Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Euclidean distance calculations in R not making sense

Tags:

r

(Preface: I'm neither a statistician nor a programmer. I work in the humanities, so have mercy on my soul).

I need to calculate the Euclidean distance between a series of points in R. I've been using dist(), as follows:

> x <- c(0,0)
> y <- c(0,10)
> dist(rbind(x,y))
   x
y 10

So far, so good. But when I was looking at my results (with real numbers), they were horribly off. So much so that I figured my R script was grabbing data from the wrong columns. But I checked, and it isn't.

So I started playing around with toy numbers, and I was in for a surprise. The above example (a vertical line) works correctly, as does the following (a horizontal line):

> x <- c(0,10)
> y <- c(0,0)
> dist(rbind(x,y))
   x
y 10

But when the line the two points form is diagonal, strangeness ensues:

> x <- c(0,10)
> y <- c(0,10)
> dist(rbind(x,y))
  x
y 0

A distance of 0? Huh? That can't be right.

And when the points are identical (that's quite possible in my data), we go down the rabbit hole:

> x <- c(0,0)
> y <- c(10,10)
> dist(rbind(x,y))
     x
y 14.14214

Should this not be 0? The points are identical, after all, so there can be no distance between them.

Just in case there's something wrong with dist(), I tried to implement the formula manually, going by Wikipedia. Same results:

> sqrt(sum((x - y) ^ 2))
[1] 14.14214

As I said above, my math background is minimal, so I fully expect that the error here is mine. If so, please explain what it is and how to correct it. But from where I stand right now, it seems like something is very wrong.

And worst of all, I can't analyze my data.

like image 996
Gil Williams Avatar asked Oct 18 '11 02:10

Gil Williams


People also ask

How do you interpret Euclidean distance in R?

Euclidean distance is the shortest possible distance between two points. Formula to calculate this distance is : Euclidean distance = √Σ(xi-yi)^2 where, x and y are the input values. The distance between 2 arrays can also be calculated in R, the array function takes a vector and array dimension as inputs.

Is Euclidean distance normalized?

The normalized squared euclidean distance gives the squared distance between two vectors where there lengths have been scaled to have unit norm. This is helpful when the direction of the vector is meaningful but the magnitude is not.

Can Euclidean distance be zero?

The Euclidean distance is always greater than or equal to zero. The measurement would be zero for identical points and high for points that show little similarity. The figure below shows an example of two points called a and b.


1 Answers

It looks like you want dist(cbind(x, y)), not dist(rbind(x, y)).

like image 52
Hong Ooi Avatar answered Oct 21 '22 04:10

Hong Ooi