Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Compare Two Paths for Similarity

I have path set A and path set B. I am trying to find an algorithm to compare both path sets for similarity.

Path characteristics:

  1. Path sets are one or more lines with two or more points per line. Lines do not have to be connected.
  2. Path sets may overlap themselves (i.e. an X).
  3. Path sets may contain different number of vertices (i.e. one path may look similar to the other but have a lot more points in it).
  4. Points are not guaranteed to be in order for both path sets.

Scale should be taken into account, i.e. a small X should match a large X. Translation does not need to be taken into account for any paths because the bottom most point of any path will have y of 0 and left most point of any path will have x of 0.

Is there a best practice or well known algorithm (I have found little in my Google searches) to compare these kinds of path sets for similarity?

like image 627
jjxtra Avatar asked Mar 19 '16 20:03

jjxtra


1 Answers

Algorithmically, I think I would try something like this:

  1. For each path, convert the consecutive pairs of points comprising the path into a list of vectors, where a vector is defined as a pairing of a magnitude (length) and a direction (an angle relative to the X-axis). You can compute these values like this (C#):

    double dx = endPoint.X - startPoint.X;
    double dy = endPoint.Y - startPoint.Y;
    double magnitude = Math.Sqrt((dx * dx) + (dy * dy));
    double direction = Math.Atan2(dy, dx) * (180 / Math.PI);
    
  2. Next, "normalize" each vector sequence by combining consecutive vectors that have the same* direction. In other words, replace those with a new vector that has the same direction and the sum of their magnitudes. This will take care of the cases where you have more than two points on the same line anywhere on your paths. After this step you should have the same number of vectors in each sequence. (If not, the paths are not similar.)

  3. Figure out the scaling factor. Take the magnitude of the first vector in the first sequence and divide it by the magnitude of the first vector in the second sequence.

  4. Now you can compare the sequences for similarity by iterating over both sequences in tandem. For each corresponding vector in each sequence, check that their directions are equal* and the ratio of their magnitudes are equal* to the scaling factor. If not, the paths are not similar.

*When checking whether two double values are "equal", you must keep in mind that not every real number can be accurately represented by a double, so you cannot directly compare two doubles and expect accurate results. Instead you should decide on an error tolerance appropriate for your situation and determine whether the difference between the values you are comparing is within that tolerance. See What is the most effective way for float and double comparison? for extensive treatment of the subject.

like image 67
Brian Rogers Avatar answered Sep 30 '22 15:09

Brian Rogers