I'm comparing the libraries dtaidistance, fastdtw and cdtw for DTW computations. This is my code:
from fastdtw import fastdtw
from cdtw import pydtw
import fastdtw
import array
from timeit import default_timer as timer
from dtaidistance import dtw, dtw_visualisation as dtwvis
s1 = mySampleSequences[0] # first sample sequence consisting of 3000 samples
s2 = mySampleSequences[1] # second sample sequence consisting of 3000 samples
start = timer()
distance1 = dtw.distance(s1, s2)
end = timer()
start2 = timer()
distance2 = dtw.distance_fast(array.array('d',s1),array.array('d',s2))
end2 = timer()
start3 = timer()
distance3, path3 = fastdtw(s1,s2)
end3 = timer()
start4 = timer()
distance4 = pydtw.dtw(s1,s2).get_dist()
end4 = timer()
print("dtw.distance(x,y) time: "+ str(end - start))
print("dtw.distance(x,y) distance: "+str(distance1))
print("dtw.distance_fast(x,y) time: "+ str(end2 - start2))
print("dtw.distance_fast(x,y) distance: " + str(distance2))
print("fastdtw(x,y) time: "+ str(end3 - start3))
print("fastdtw(x,y) distance: " + str(distance3))
print("pydtw.dtw(x,y) time: "+ str(end4 - start4))
print("pydtw.dtw(x,y) distance: " + str(distance4))
This is the output I get:
My question is: Why do I get different performances and different distances? Thank you very much for your comments.
// edit: The unit of the time measurements is seconds.
Some additional information on top of Felipe Mello's informative answer (disclaimer: author of DTAIDistance here).
For the distance results:
For the speed results:
In general, pure C-based algorithms are ~100 times faster than pure Python ones (in DTAIDistance this is the difference between distance() and distance_fast()). For the C-based methods the differences are mainly due to flexibility of the methods. Passing a custom norm, for example, will slow down the method (more function calls). Also, different methods have different options which cause more or less switch statements in the algorithm. For example, DTAIDistance, offers quite a number of options to tune the method because it prefers early stopping the computation over further optimizations (also observed by Felipe Mello). Furthermore, different methods store different amounts of data. The DTAIDistance distance method does not store the entire matrix to also offer linear space complexity (the full matrix is obtained using the warping_paths method that has quadratic space complexity). In general for DTW it is recommended to use a window to reduce also the time complexity a bit.
For DTAIDistance, all the design choices were made with timeseries clustering applications in mind (the distance_matrix_fast method). This is another reason not to allow custom norms. The DTW code needs to be pure C to support parallelization on the level of C-code and have minimal overhead (it uses OpenMP) to compute all pairwise distances between series.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With