Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Weighted Euclidean Distance in R

I'd like to create a distance-matrix with weighted euclidean distances from a data frame. The weights will be defined in a vector. Here's an example:

library("cluster")

a <- c(1,2,3,4,5)
b <- c(5,4,3,2,1)
c <- c(5,4,1,2,3)
df <- data.frame(a,b,c)

weighting <- c(1, 2, 3)

dm <- as.matrix(daisy(df, metric = "euclidean", weights = weighting))

I've searched everywhere and can't find a package or solution to this in R. The 'daisy' function within the 'cluster' package claims to support weighting, but the weights don't seem to be applied and it just spits out regular euclid. distances.

Any ideas Stack Overflow?

like image 735
brent_mused Avatar asked Aug 30 '16 20:08

brent_mused


People also ask

How do you calculate Euclidean distance in R?

Euclidean distance is the shortest possible distance between two points. Formula to calculate this distance is : Euclidean distance = √Σ(xi-yi)^2 where, x and y are the input values. The distance between 2 arrays can also be calculated in R, the array function takes a vector and array dimension as inputs.

How do you calculate weight distance?

The distance-weighted mean is: DWM=w1x1+w2x2+w3x3+w4x4w1+w2+w3+w4≈7.3.

How do you calculate Euclidean distance for data?

Euclidean distance is calculated as the square root of the sum of the squared differences between the two vectors.

How do you calculate Manhattan distance in R?

It is defined as the sum of absolute distance between coordinates in corresponding dimensions. For example, In a 2-dimensional space having two points Point1 (x1,y1) and Point2 (x2,y2), the Manhattan distance is given by |x1 – x2| + |y1 – y2|.


1 Answers

We can use @WalterTross' technique of scaling by multiplying each column by the square root of its respective weight first:

newdf <- sweep(df, 2, weighting, function(x,y) x * sqrt(y))
as.matrix(daisy(newdf, metric="euclidean"))

But just in case you would like to have more control and understanding of what euclidean distance is, we can write a custom function. As a note, I have chosen a different weighting method. :

xpand <- function(d) do.call("expand.grid", rep(list(1:nrow(d)), 2))
euc_norm <- function(x) sqrt(sum(x^2))
euc_dist <- function(mat, weights=1) {
  iter <- xpand(mat)
  vec <- mapply(function(i,j) euc_norm(weights*(mat[i,] - mat[j,])), 
                iter[,1], iter[,2])
  matrix(vec,nrow(mat), nrow(mat))
}

We can test the result by checking against the daisy function:

#test1
as.matrix(daisy(df, metric="euclidean"))
#          1        2        3        4        5
# 1 0.000000 1.732051 4.898979 5.196152 6.000000
# 2 1.732051 0.000000 3.316625 3.464102 4.358899
# 3 4.898979 3.316625 0.000000 1.732051 3.464102
# 4 5.196152 3.464102 1.732051 0.000000 1.732051
# 5 6.000000 4.358899 3.464102 1.732051 0.000000

euc_dist(df)
#          [,1]     [,2]     [,3]     [,4]     [,5]
# [1,] 0.000000 1.732051 4.898979 5.196152 6.000000
# [2,] 1.732051 0.000000 3.316625 3.464102 4.358899
# [3,] 4.898979 3.316625 0.000000 1.732051 3.464102
# [4,] 5.196152 3.464102 1.732051 0.000000 1.732051
# [5,] 6.000000 4.358899 3.464102 1.732051 0.000000

The reason I doubt Walter's method is because firstly, I've never seen weights applied by their square root, it's usually 1/w. Secondly, when I apply your weights to my function, I get a different result.

euc_dist(df, weights=weighting) 
like image 59
Pierre L Avatar answered Sep 20 '22 08:09

Pierre L