Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Alternatives to the Dynamic Time Warping (DTW) method

Tags:

c#

time-series

I am doing some research into methods of comparing time series data. One of the algorithms that I have found being used for matching this type of data is the DTW (Dynamic Time Warping) algorithm.

The data I have, resemble the following structure (this can be one path):

Path    Event      Time            Location (x,y)
  1       1       2:30:02             1,5
  1       2       2:30:04             2,7
  1       3       2:30:06             4,4
...
...

Now, I was wondering whether there are other algorithms that would be suitable to find the closest match for the given path.

like image 593
user496607 Avatar asked Jan 13 '12 00:01

user496607


People also ask

Why is DTW not a metric?

Abstract. In general, the inter-word dissimilarity measure supplied by Dynamic Time Warping algorithms can not be assumed to be a metric because it does not fully satisfy all the required properties (the triangle inequality in particular).

Why is Dynamic Time Warping useful?

Dynamic Time Warping (DTW) is a way to compare two -usually temporal- sequences that do not sync up perfectly. It is a method to calculate the optimal matching between two sequences. DTW is useful in many domains such as speech recognition, data mining, financial markets, etc.

Is Dynamic Time Warping AI?

Dynamic Time Warping (DTW) is an A.I. technique which has been very useful for normalizing and comparing data with unequal lengths of data. Similarly, there are key inputs of unequal lengths and varying time speeds.

What is FastDTW?

FastDTW is an approximate Dynamic Time Warping (DTW) algorithm that provides optimal or near-optimal alignments with an O(N) time and memory complexity, in contrast to the O(N^2) requirement for the standard DTW algorithm.


2 Answers

The keyword you are looking for is "(dis-)similarity measures".

Euclidean Distance (ED) as referred to by Adam Mihalcin (first answer) is easily computable and somehow reflects the natural understanding of the word distance in natural language. Yet when comparing two time series, DTW is to be preffered - especially when applied to real world data.

1) ED can only be applied to series of equal length. Therefore when points are missing, ED simply is not computable (unless also cutting the other sequence, thus loosing more information).

2) ED does not allow time-shifting or time-warping opposed to all algorithms which are based on DTW.

Thus ED is not a real alternative to DTW, because the requirements and restrictions are much higher. But to answer your question, I want to recommend to you this lecture:

Time-series clustering – A decade review Saeed Aghabozorgi, Ali Seyed Shirkhorshidi, Teh Ying Wah http://www.sciencedirect.com/science/article/pii/S0306437915000733

This paper gives an overview about (dis-)similarity measures used in time series clustering. Here a little excerpt to motivate your actually reading the paper:

enter image description here

like image 139
Nikolas Rieble Avatar answered Nov 15 '22 00:11

Nikolas Rieble


If two paths are the same length, say n, then they are really points in an 2n-dimensional space. The first location determines the first two dimensions, the second location determines the next two dimensions, and so on. For example, if we just take the three points in your example, the path can be represented as the single 6-dimensional point (1, 5, 2, 7, 4, 4). If we want to compare this to another three-point path, we can compute either the Euclidean distance (square root of the sum of squares of per-dimension distances between the two points) or the Manhattan distance (sum of the per-dimension differences).

For example, the boring path that stays at (0, 0) for all three times becomes the 6-dimensional point (0, 0, 0, 0, 0, 0). Then the Euclidean distance between this point and your example path is sqrt((1-0)^2 + (5-0)^2 + (2-0)^2 + (7-0)^2 + (4-0)^2 + (4-0)^2) = sqrt(111) = 10.54. The Manhattan distance is abs(1-0) + abs(5-0) + abs(2-0) + abs(7-0) + abs(4-0) + abs(4-0) = 23. This kind of a difference between the metrics is not unusual, since the Manhattan distance is provably at least as great as the Euclidean distance.

Of course one problem with this approach is that not all paths will be of the same length. However, you can easily cut off the longer path to the same length as the shorter path, or consider the shorter of the two paths to stay at the same location or moving in the same direction after measurements end, until both paths are the same length. Either approach will introduce some inaccuracies, but no matter what you do you have to deal with the fact that you are missing data on the short path and have to make up for it somehow.

EDIT:

Assuming that path1 and path2 are both List<Tuple<int, int>> objects containing the points, we can cut off the longer list to match the shorter list as:

// Enumerable.Zip stops when it finishes one of the sequences
List<Tuple<int, int, int, int>> matchingPoints = Enumerable.Zip(path1, path2,
    (tupl1, tupl2) =>
        Tuple.Create(tupl1.Item1, tupl1.Item2, tupl2.Item1, tupl2.Item2));

Then, you can use the following code to find the Manhattan distance:

int manhattanDistance = matchingPoints
    .Sum(tupl => Math.Abs(tupl.Item1 - tupl.Item3)
               + Math.Abs(tupl.Item2 - tupl.Item4));

With the same assumptions as for the Manhattan distance, we can generate the Euclidean distance as:

int euclideanDistanceSquared = matchingPoints
    .Sum(tupl => Math.Pow(tupl.Item1 - tupl.Item3, 2)
               + Math.Pow(tupl.Item2 - tupl.Item4, 2));
double euclideanDistance = Math.Sqrt(euclideanDistanceSquared);
like image 35
Adam Mihalcin Avatar answered Nov 15 '22 00:11

Adam Mihalcin