From a very simple dataframe like
time1 <- as.Date("2010/10/10")
time2 <- as.Date("2010/10/11")
time3 <- as.Date("2010/10/12")
test <- data.frame(Sample=c("A","B", "C"), Date=c(time1, time2, time3))
how can i obtain a matrix with pairwise temporal distances (elapsed time in days between samples) between the Samples A, B, C?
A B C
A 0 1 2
B 1 0 1
C 2 1 0
/edit: changed the format of the dates. sorry for inconveniences
To get actual days calculations, you can convert the days to a date since some pre-defined date and then use dist
. Example below (converted your days, I doubt they were represented how you expected them to be):
time1 <- as.Date("02/10/10","%m/%d/%y")
time2 <- as.Date("02/10/11","%m/%d/%y")
time3 <- as.Date("02/10/12","%m/%d/%y")
test <- data.frame(Sample=c("A","B", "C"), Date=c(time1, time2, time3))
days_s2010 <- difftime(test$Date,as.Date("01/01/10","%m/%d/%y"))
dist_days <- as.matrix(dist(days_s2010,diag=TRUE,upper=TRUE))
rownames(dist_days) <- test$Sample; colnames(dist_days) <- test$Sample
dist_days
then prints out:
> dist_days
A B C
A 0 365 730
B 365 0 365
C 730 365 0
Actually dist
doesn't need to convert the dates to days since some time, simply doing dist(test$Date)
will work for days.
Using outer()
You don't need to work with a data frame. In your example, we can collect your dates in a single vector and use outer()
x <- c(time1, time2, time3)
abs(outer(x, x, "-"))
[,1] [,2] [,3]
[1,] 0 1 2
[2,] 1 0 1
[3,] 2 1 0
Note I have added an abs()
outside, so that you will only get positive time difference, i.e, the time difference "today - yesterday" and "yesterday - today" are both 1.
If your data are pre-stored in a data frame, you can extract that column as a vector and then proceed.
Using dist()
As Konrad mentioned, dist()
is often used for computation of distance matrix. The greatest advantage is that it will only compute lower/upper triangular matrix (diagonal are 0), while copying the rest. On the other hand, outer()
forces computing all matrix elements, not knowing the symmetry.
However, dist()
takes numerical vectors, and only computes some classes of distance. See ?dist
Arguments:
x: a numeric matrix, data frame or ‘"dist"’ object.
method: the distance measure to be used. This must be one of
‘"euclidean"’, ‘"maximum"’, ‘"manhattan"’, ‘"canberra"’,
‘"binary"’ or ‘"minkowski"’. Any unambiguous substring can
be given.
But we can actually work around, to use it.
Date object, can be coerced into integers, if you give it an origin. By
x <- as.numeric(x - min(x))
we get number of days since the first day in record. Now we can use dist()
with the default Euclidean
distance:
y <- as.matrix(dist(x, diag = TRUE, upper = TRUE))
rownames(y) <- colnames(y) <- c("A", "B", "C")
A B C
A 0 1 2
B 1 0 1
C 2 1 0
Why putting outer()
as my first example
In principle, time difference is not unsigned. In this case,
outer(x, x, "-")
is more appropriate. I added the abs()
later, because it seems that you intentionally want positive result.
Also, outer()
has far broader use than dist()
. Have a look at my answer here. That OP asks for computing Hamming distance, which is really a kind of bitwise distance.
A really fast solution using a data.table
approach in two steps
# load library
library(reshape)
library(data.table)
# 1. Get all possible combinations of pairs of dates in long format
df <- expand.grid.df(test, test)
colnames(df) <- c("Sample", "Date", "Sample2", "Date2")
# 2. Calculate distances in days, weeks or hours, minutes etc
setDT(df)[, datedist := difftime(Date2, Date, units ="days")]
df
#> Sample Date Sample2 Date2 datedist
#> 1: A 2010-10-10 A 2010-10-10 0 days
#> 2: B 2010-10-11 A 2010-10-10 -1 days
#> 3: C 2010-10-12 A 2010-10-10 -2 days
#> 4: A 2010-10-10 B 2010-10-11 1 days
#> 5: B 2010-10-11 B 2010-10-11 0 days
#> 6: C 2010-10-12 B 2010-10-11 -1 days
#> 7: A 2010-10-10 C 2010-10-12 2 days
#> 8: B 2010-10-11 C 2010-10-12 1 days
#> 9: C 2010-10-12 C 2010-10-12 0 days
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With