I have a dataset containing something like this:
case,group,val1,val2,val3,val4
1,1,3,5,6,8
2,1,2,7,5,4
3,2,1,3,6,8
4,2,5,4,3,7
5,1,8,6,5,3
I'm trying to compute programmatically the Euclidean distance between the vectors of values in groups.
This means that I have x number of cases in n number of groups. The euclidean distance is computed between pairs of rows and then averaged for the group. So, in the example above, first I compute the mean and std dev of group 1 (case 1, 2 and 5), then standardise values (i.e. [(original value - mean)/st dev], then compute the ED between case 1 and case 2, case 2 and 5, and case 1 and 5, and finally average the ED for the group.
Can anyone suggest a neat way of achieving this in a reasonably efficient way?
Yes, it is probably easier in R...
Your data:
dat <- data.frame(case = 1:5,
group = c(1, 1, 2, 2, 1),
val1 = c(3, 2, 1, 5, 8),
val2 = c(5, 7, 3, 4, 6),
val3 = c(6, 5, 6, 3, 5),
val4 = c(8, 4, 8, 7, 3))
A short solution:
library(plyr)
ddply(dat[c("group", "val1", "val2", "val3", "val4")],
"group", function(x)c(mean.ED = mean(dist(scale(as.matrix(x))))))
# group mean.ED
# 1 1 3.121136
# 2 2 3.162278
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With