Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Different results and performances with different libraries

Tags:

python

dtw

I'm comparing the libraries dtaidistance, fastdtw and cdtw for DTW computations. This is my code:

from fastdtw import fastdtw
from cdtw import pydtw
import fastdtw
import array
from timeit import default_timer as timer
from dtaidistance import dtw, dtw_visualisation as dtwvis

s1 = mySampleSequences[0] # first sample sequence consisting of 3000 samples
s2 = mySampleSequences[1] # second sample sequence consisting of 3000 samples

start = timer()
distance1 = dtw.distance(s1, s2)
end = timer()
start2 = timer()
distance2 = dtw.distance_fast(array.array('d',s1),array.array('d',s2))
end2 = timer()
start3 = timer()
distance3, path3 = fastdtw(s1,s2)
end3 = timer()
start4 = timer()
distance4 = pydtw.dtw(s1,s2).get_dist()
end4 = timer()

print("dtw.distance(x,y) time: "+ str(end - start))
print("dtw.distance(x,y) distance: "+str(distance1))
print("dtw.distance_fast(x,y) time: "+ str(end2 - start2))
print("dtw.distance_fast(x,y) distance: " + str(distance2))
print("fastdtw(x,y) time: "+ str(end3 - start3))
print("fastdtw(x,y) distance: " + str(distance3))
print("pydtw.dtw(x,y) time: "+ str(end4 - start4))
print("pydtw.dtw(x,y) distance: " + str(distance4))

This is the output I get:

  • dtw.distance(x,y) time: 22.16925272245262
  • dtw.distance(x,y) distance: 1888.8583853746156
  • dtw.distance_fast(x,y) time: 0.3889036471839056
  • dtw.distance_fast(x,y) distance: 1888.8583853746156
  • fastdtw(x,y) time: 0.23296659641047412
  • fastdtw(x,y) distance: 27238.0
  • pydtw.dtw(x,y) time: 0.13706478039556558
  • pydtw.dtw(x,y) distance: 17330.0

My question is: Why do I get different performances and different distances? Thank you very much for your comments.

// edit: The unit of the time measurements is seconds.

like image 848
Hagbard Avatar asked Sep 24 '18 12:09

Hagbard


1 Answers

Some additional information on top of Felipe Mello's informative answer (disclaimer: author of DTAIDistance here).

For the distance results:

  • DTAIDistance only uses Euclidean distance (or L2 norm), this is hardcoded. This choice was made to speed up the execution of the C-code (no function calls). The 'fast' refers to using the C-based implementation instead of a pure Python version and both methods thus give the exact same results.
  • FastDTW is a different algorithm than DTW. It is a linear approximation. The 'fast' refers to a lower complexity.
  • cDTW. I'm not very familiar with this toolbox but it seems to implement L1 norm.

For the speed results:

In general, pure C-based algorithms are ~100 times faster than pure Python ones (in DTAIDistance this is the difference between distance() and distance_fast()). For the C-based methods the differences are mainly due to flexibility of the methods. Passing a custom norm, for example, will slow down the method (more function calls). Also, different methods have different options which cause more or less switch statements in the algorithm. For example, DTAIDistance, offers quite a number of options to tune the method because it prefers early stopping the computation over further optimizations (also observed by Felipe Mello). Furthermore, different methods store different amounts of data. The DTAIDistance distance method does not store the entire matrix to also offer linear space complexity (the full matrix is obtained using the warping_paths method that has quadratic space complexity). In general for DTW it is recommended to use a window to reduce also the time complexity a bit.

For DTAIDistance, all the design choices were made with timeseries clustering applications in mind (the distance_matrix_fast method). This is another reason not to allow custom norms. The DTW code needs to be pure C to support parallelization on the level of C-code and have minimal overhead (it uses OpenMP) to compute all pairwise distances between series.

like image 135
wannesm Avatar answered Sep 18 '22 10:09

wannesm