Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Temporal distance matrix from dates

Tags:

date

datetime

r

From a very simple dataframe like

    time1 <- as.Date("2010/10/10")
    time2 <- as.Date("2010/10/11")
    time3 <- as.Date("2010/10/12")
    test <- data.frame(Sample=c("A","B", "C"), Date=c(time1, time2, time3))

how can i obtain a matrix with pairwise temporal distances (elapsed time in days between samples) between the Samples A, B, C?

   A  B  C
A  0  1  2
B  1  0  1
C  2  1  0

/edit: changed the format of the dates. sorry for inconveniences

like image 821
nouse Avatar asked Jun 22 '16 12:06

nouse


3 Answers

To get actual days calculations, you can convert the days to a date since some pre-defined date and then use dist. Example below (converted your days, I doubt they were represented how you expected them to be):

time1 <- as.Date("02/10/10","%m/%d/%y")
time2 <- as.Date("02/10/11","%m/%d/%y")
time3 <- as.Date("02/10/12","%m/%d/%y")
test <- data.frame(Sample=c("A","B", "C"), Date=c(time1, time2, time3))
days_s2010 <- difftime(test$Date,as.Date("01/01/10","%m/%d/%y"))
dist_days <- as.matrix(dist(days_s2010,diag=TRUE,upper=TRUE))
rownames(dist_days) <- test$Sample; colnames(dist_days) <- test$Sample

dist_days then prints out:

> dist_days
    A   B   C
A   0 365 730
B 365   0 365
C 730 365   0

Actually dist doesn't need to convert the dates to days since some time, simply doing dist(test$Date) will work for days.

like image 64
Andy W Avatar answered Oct 10 '22 18:10

Andy W


Using outer()

You don't need to work with a data frame. In your example, we can collect your dates in a single vector and use outer()

x <- c(time1, time2, time3)
abs(outer(x, x, "-"))

     [,1] [,2] [,3]
[1,]    0    1    2
[2,]    1    0    1
[3,]    2    1    0

Note I have added an abs() outside, so that you will only get positive time difference, i.e, the time difference "today - yesterday" and "yesterday - today" are both 1.

If your data are pre-stored in a data frame, you can extract that column as a vector and then proceed.

Using dist()

As Konrad mentioned, dist() is often used for computation of distance matrix. The greatest advantage is that it will only compute lower/upper triangular matrix (diagonal are 0), while copying the rest. On the other hand, outer() forces computing all matrix elements, not knowing the symmetry.

However, dist() takes numerical vectors, and only computes some classes of distance. See ?dist

Arguments:

       x: a numeric matrix, data frame or ‘"dist"’ object.

  method: the distance measure to be used. This must be one of
          ‘"euclidean"’, ‘"maximum"’, ‘"manhattan"’, ‘"canberra"’,
          ‘"binary"’ or ‘"minkowski"’.  Any unambiguous substring can
          be given.

But we can actually work around, to use it.

Date object, can be coerced into integers, if you give it an origin. By

x <- as.numeric(x - min(x))

we get number of days since the first day in record. Now we can use dist() with the default Euclidean distance:

y <- as.matrix(dist(x, diag = TRUE, upper = TRUE))
rownames(y) <- colnames(y) <- c("A", "B", "C")

  A B C
A 0 1 2
B 1 0 1
C 2 1 0

Why putting outer() as my first example

In principle, time difference is not unsigned. In this case,

outer(x, x, "-")

is more appropriate. I added the abs() later, because it seems that you intentionally want positive result.

Also, outer() has far broader use than dist(). Have a look at my answer here. That OP asks for computing Hamming distance, which is really a kind of bitwise distance.

like image 34
Zheyuan Li Avatar answered Oct 10 '22 18:10

Zheyuan Li


A really fast solution using a data.table approach in two steps

# load library
 library(reshape)
 library(data.table)

# 1. Get all possible combinations of pairs of dates in long format
df <- expand.grid.df(test, test)
colnames(df) <- c("Sample", "Date", "Sample2", "Date2")

# 2. Calculate distances in days, weeks or hours, minutes etc
setDT(df)[, datedist := difftime(Date2, Date, units ="days")]

df
#>    Sample       Date Sample2      Date2 datedist
#> 1:      A 2010-10-10       A 2010-10-10   0 days
#> 2:      B 2010-10-11       A 2010-10-10  -1 days
#> 3:      C 2010-10-12       A 2010-10-10  -2 days
#> 4:      A 2010-10-10       B 2010-10-11   1 days
#> 5:      B 2010-10-11       B 2010-10-11   0 days
#> 6:      C 2010-10-12       B 2010-10-11  -1 days
#> 7:      A 2010-10-10       C 2010-10-12   2 days
#> 8:      B 2010-10-11       C 2010-10-12   1 days
#> 9:      C 2010-10-12       C 2010-10-12   0 days
like image 5
rafa.pereira Avatar answered Oct 10 '22 20:10

rafa.pereira