Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Efficient multidimensional dynamic time warping implementation

Here's how the literature explains how to compute multidimensional dynamic time warping of two time series:

 library(dtw)
 x<- cbind(1:10,1)
 y<- cbind(11:15,2)
 cxdist <-dist(x,y,method="euclidean")
 dtw(cxdist)$distance

In fact it first computes the cross distance matrix and then use it as input in the dtw function.

I'd like to use multidimensional dynamic time warping in image classification with quite large images. Image values are stored in a data frame that could look like this:

 inDf <- data.frame(matrix(rnorm(60), ncol = 6))
 colnames(inDf) <- c('var1t1','var2t1','var1t2','var2t2','var1t3','var2t3')

In this example, there are two variables (var1 and var2) observed three times.

The question is how to get the dtw distance matrix with the as much efficiency as possible regarding computing intensity?

Here are some thoughts: - iterate through each values of the input image matrices, reshape the vectors to matrices in order to be able to compute the cross distances and then compute the dtw distance and store it in a dedicated matrix. This is certainly the most computing intensive solution

like image 353
WAF Avatar asked Nov 02 '22 10:11

WAF


1 Answers

When dealing with intensive computations always makes sense to consider Rcpp package. If you want to get distance matrix with euclidean distances faster, you can implement corresponding Rcpp function:

library(Rcpp)
library(inline)

# Rcpp function for euclidean distance
fastdist <- cxxfunction(signature(x="matrix", y="matrix"), plugin="Rcpp",
body='
  Rcpp::NumericMatrix dx(x);
  Rcpp::NumericMatrix dy(y);

  const int N = dx.nrow();
  const int M = dy.nrow();

  Rcpp::NumericMatrix res(N, M);

  for(int i=0; i<N; i++){
    for(int j=0; j<M; j++){
      res(i,j) = sqrt(sum((dx(i,_)-dy(j,_))*(dx(i,_)-dy(j,_))));
    }
  }

  return res;
')

It uses Rcpp syntactic sugar in order to make code more compact and readable. However, sometimes it's better having wrapper function for checking types, coercing, etc. It's not necessary - you can call fastdist directly. But, anyway, wrapper can look like this:

# Wrapper R function
fast.dist <- function(x, y){
  stopifnot(class(x) %in% c("data.frame","matrix") &
            class(y) %in% c("data.frame","matrix") &
            ncol(x)==ncol(y))

  fastdist(as.matrix(x), as.matrix(y))
}

Now we can turn to literature example.

library(dtw)

# EXAMPLE 1
x<- cbind(1:10,1)
y<- cbind(11:15,2)
# Check results
all.equal(fast.dist(x,y), dist(x,y,method="euclidean"), check.attributes=F)
# [1] "target is matrix, current is crossdist"
all.equal(fast.dist(x,y), matrix(dist(x,y,method="euclidean"), ncol=nrow(y)))
# [1] TRUE

Note, that dist returns result of class crossdist. So, for comparison it should be coerced to matrix.

And now your primary question - we're generating data first:

# EXAMPLE 2
set.seed(1234)
N <- 100
inDf <- data.frame(matrix(rnorm(6*N), ncol = 6))
colnames(inDf) <- c('var1t1','var2t1','var1t2','var2t2','var1t3','var2t3')

# Extracting variables
var1 <- inDf[,c("var1t1","var1t2","var1t3")]
var2 <- inDf[,c("var2t1","var2t2","var2t3")]

I'm not completely sure about your data structure, but in any case you can always prepare variables according to your needs.

Comparison and benchmarking:

library(rbenchmark)

all.equal(fast.dist(var1,var2), matrix(dist(var1,var2), ncol=N))
# [1] TRUE
benchmark(fast.dist(var1,var2), dist(var1,var2), order="relative")[,1:4]
#                    test replications elapsed relative
# 1 fast.dist(var1, var2)          100   0.081    1.000
# 2      dist(var1, var2)          100   0.246    3.037

fast.dist is roughly 3 times faster than dist in this case. However, while N is growing the relative speed-up will go down.

Also note, as were mentioned in comments, dtw can compute distance matrix by itself. Nevertheless, it's more efficient to have distance matrix precomputed. See quick test below:

cxdist <- fast.dist(var1,var2)
benchmark(dtw(cxdist)$distance, dtw(var1,var2)$distance, order="relative")[,1:4]
#                       test replications elapsed relative
# 1     dtw(cxdist)$distance          100   0.476    1.000
# 2 dtw(var1, var2)$distance          100   0.736    1.546

Also, if you're interested only in $distance you can pass distance.only=T to dtw() - it gives some speed-up.

like image 121
redmode Avatar answered Nov 04 '22 06:11

redmode