Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Convert a dataframe to an object of class "dist" without actually calculating distances in R

Tags:

r

distance

I have a dataframe with distances

df<-data.frame(site.x=c("A","A","A","B","B","C"),   
site.y=c("B","C","D","C","D","D"),Distance=c(67,57,64,60,67,60))

I need to convert this to an object of class "dist" but I do not need to calculate a distance so therefore I cannon use the dist() function. Any advice?

like image 904
Elizabeth Avatar asked Jul 05 '12 11:07

Elizabeth


4 Answers

There is nothing stopping you from creating the dist object yourself. It is just a vector of distances with attributes that set up the labels, size, etc.

Using your df, this is how

dij2 <- with(df, Distance)
nams <- with(df, unique(c(as.character(site.x), as.character(site.y))))
attributes(dij2) <- with(df, list(Size = length(nams),
                                  Labels = nams,
                                  Diag = FALSE,
                                  Upper = FALSE,
                                  method = "user"))
class(dij2) <- "dist"

Or you can do this via structure() directly:

dij3 <- with(df, structure(Distance,
                           Size = length(nams),
                           Labels = nams,
                           Diag = FALSE,
                           Upper = FALSE,
                           method = "user",
                           class = "dist"))

These give:

> df
  site.x site.y Distance
1      A      B       67
2      A      C       57
3      A      D       64
4      B      C       60
5      B      D       67
6      C      D       60
> dij2
   A  B  C
B 67      
C 57 60   
D 64 67 60
> dij3
   A  B  C
B 67      
C 57 60   
D 64 67 60

Note: The above do no checking that the data are in the right order. Make sure you have the data in df in the correct order as you do in the example; i.e. sort by site.x then site.y before you run the code I show.

like image 190
Gavin Simpson Avatar answered Nov 07 '22 22:11

Gavin Simpson


I had a similar problem not to long ago and solved it like this:

n <- max(table(df$site.x)) + 1  # +1,  so we have diagonal of 
res <- lapply(with(df, split(Distance, df$site.x)), function(x) c(rep(NA, n - length(x)), x))
res <- do.call("rbind", res)
res <- rbind(res, rep(NA, n))
res <- as.dist(t(res))
like image 23
johannes Avatar answered Nov 07 '22 22:11

johannes


?as.dist() should help you, though it expects a matrix as input.

like image 37
Roland Avatar answered Nov 07 '22 22:11

Roland


For people coming in from google... The acast function in the reshape2 library is way easier for this kind of stuff.

library(reshape2)
acast(df, site.x ~ site.y, value.var='Distance', fun.aggregate = sum, margins=FALSE)
like image 30
user4440672 Avatar answered Nov 07 '22 21:11

user4440672