Weighted Euclidean Distance in R

Tags:

I'd like to create a distance-matrix with weighted euclidean distances from a data frame. The weights will be defined in a vector. Here's an example:

library("cluster")

a <- c(1,2,3,4,5)
b <- c(5,4,3,2,1)
c <- c(5,4,1,2,3)
df <- data.frame(a,b,c)

weighting <- c(1, 2, 3)

dm <- as.matrix(daisy(df, metric = "euclidean", weights = weighting))

I've searched everywhere and can't find a package or solution to this in R. The 'daisy' function within the 'cluster' package claims to support weighting, but the weights don't seem to be applied and it just spits out regular euclid. distances.

Any ideas Stack Overflow?

735

asked Aug 30 '16 20:08

brent_mused

1 Answers

We can use @WalterTross' technique of scaling by multiplying each column by the square root of its respective weight first:

newdf <- sweep(df, 2, weighting, function(x,y) x * sqrt(y))
as.matrix(daisy(newdf, metric="euclidean"))

But just in case you would like to have more control and understanding of what euclidean distance is, we can write a custom function. As a note, I have chosen a different weighting method. :

xpand <- function(d) do.call("expand.grid", rep(list(1:nrow(d)), 2))
euc_norm <- function(x) sqrt(sum(x^2))
euc_dist <- function(mat, weights=1) {
  iter <- xpand(mat)
  vec <- mapply(function(i,j) euc_norm(weights*(mat[i,] - mat[j,])), 
                iter[,1], iter[,2])
  matrix(vec,nrow(mat), nrow(mat))
}

We can test the result by checking against the daisy function:

#test1
as.matrix(daisy(df, metric="euclidean"))
#          1        2        3        4        5
# 1 0.000000 1.732051 4.898979 5.196152 6.000000
# 2 1.732051 0.000000 3.316625 3.464102 4.358899
# 3 4.898979 3.316625 0.000000 1.732051 3.464102
# 4 5.196152 3.464102 1.732051 0.000000 1.732051
# 5 6.000000 4.358899 3.464102 1.732051 0.000000

euc_dist(df)
#          [,1]     [,2]     [,3]     [,4]     [,5]
# [1,] 0.000000 1.732051 4.898979 5.196152 6.000000
# [2,] 1.732051 0.000000 3.316625 3.464102 4.358899
# [3,] 4.898979 3.316625 0.000000 1.732051 3.464102
# [4,] 5.196152 3.464102 1.732051 0.000000 1.732051
# [5,] 6.000000 4.358899 3.464102 1.732051 0.000000

The reason I doubt Walter's method is because firstly, I've never seen weights applied by their square root, it's usually 1/w. Secondly, when I apply your weights to my function, I get a different result.

euc_dist(df, weights=weighting)

answered Sep 20 '22 08:09

Pierre L

Related questions
                            
                                R code to test the difference between coefficients of regressors from one regression
                            
                                Adding arrow symbols in ggplot text in R
                            
                                bquote does not work in facet_grid labels in ggplot2 version 2.1
                            
                                Create color palette function from named list or vector?
                            
                                geom_histogram: wrong bins?
                            
                                Can sparklyr be used with spark deployed on yarn-managed hadoop cluster?
                            
                                How to add new slot to already existing class?
                            
                                Multi node cluster installation with h2o on AWS EC2
                            
                                R shiny sliderInput with restricted range
                            
                                How do I impute missing variables in R using dplyr?
                            
                                tab specific sidebar in shinydashboard
                            
                                R tm substitute words in Corpus using gsub
                            
                                Polar Bar Plot, with inner-most circle empty Using R
                            
                                Change confidence interval format in package metafor forest graph?
                            
                                libicu and stringi on Fedora 24 causing R headaches
                            
                                GLMER warning: variance-covariance matrix [...] is not positive definite or contains NA values
                            
                                Using pseudocolour in ggplot2 scatter plot to indicate density
                            
                                Inserting a blank page after a title page in RMarkdown
                            
                                dealing with the datetime value in R
                            
                                R, mutate and "Unsupported type NILSXP for column"

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Weighted Euclidean Distance in R

Tags:

r

euclidean-distance

cluster-analysis

r-daisy

brent_mused

People also ask

1 Answers

Pierre L

Recent Activity

Donate For Us