Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I create a distance matrix containing the mean absolute scores between each row?

Given the matrix,

df <- read.table(text="
 X1 X2 X3 X4 X5
  1  2  3  2  1
  2  3  4  4  3
  3  4  4  6  2
  4  5  5  5  4
  2  3  3  3  6
  5  6  2  8  4", header=T)

I want to create a distance matrix containing the absolute mean difference between each row of each column. For example, the distance between X1 and X3 should be = 1.67 given that:

abs(1 - 3) + abs(2-4) + abs(3-4) + abs(4-5) + abs(2-3) + abs(5-2) = 10 / 6 = 1.67

I have tried using the designdist() function in the vegan package this way:

designdist(t(df), method = "abs(A-B)/6", terms = "minimum")

The resulting distance for columns 1 and 3 is 0.666. The problem with this function is that it sums all the values in each column and then subtracts them. But I need to sum the absolute differences between each row (individually, absolute) and then divide it by N.

like image 585
Werner Avatar asked May 22 '12 17:05

Werner


1 Answers

Here's a one-line solution. It takes advantage of dist()'s method argument to calculate the L1 norm aka city block distance aka Manhattan distance between each pair of columns in your data.frame.

as.matrix(dist(df, "manhattan", diag=TRUE, upper=TRUE)/nrow(df))

To make it reproducible:

df <- read.table(text="
 X1 X2 X3 X4 X5
  1  2  3  2  1
  2  3  4  4  3
  3  4  4  6  2
  4  5  5  5  4
  2  3  3  3  6
  5  6  2  8  4", header=T)

dmat <- as.matrix(dist(df, "manhattan", diag=TRUE, upper=TRUE)/nrow(df))
print(dmat, digits=3)
#      1     2     3    4     5    6
# 1 0.00 1.167 1.667 2.33 1.333 3.00
# 2 1.17 0.000 0.833 1.17 0.833 2.17
# 3 1.67 0.833 0.000 1.00 1.667 1.67
# 4 2.33 1.167 1.000 0.00 1.667 1.33
# 5 1.33 0.833 1.667 1.67 0.000 2.33
# 6 3.00 2.167 1.667 1.33 2.333 0.00
like image 130
Josh O'Brien Avatar answered Sep 30 '22 10:09

Josh O'Brien