Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to find all equals paths in degenerate tree, which start on specific vertex?

I have some degenerate tree (it looks like as array or doubly linked list). For example, it is this tree:

enter image description here

Each edge has some weight. I want to find all equal paths, which starts in each vertex.

In other words, I want to get all tuples (v1, v, v2) where v1 and v2 are an arbitrary ancestor and descendant such that c(v1, v) = c(v, v2).

Let edges have the following weights (it is just example):

a-b = 3

b-c = 1

c-d = 1

d-e = 1

Then:

  1. The vertex A does not have any equal path (there is no vertex from left side).
  2. The vertex B has one equal pair. The path B-A equals to the path B-E (3 == 3).
  3. The vertex C has one equal pair. The path B-C equals to the path C-D (1 == 1).
  4. The vertex D has one equal pair. The path C-D equals to the path D-E (1 == 1).
  5. The vertex E does not have any equal path (there is no vertex from right side).

I implement simple algorithm, which works in O(n^2). But it is too slow for me.

like image 739
David Avatar asked May 15 '15 22:05

David


1 Answers

You write, in comments, that your current approach is

It seems, I looking for a way to decrease constant in O(n^2). I choose some vertex. Then I create two set. Then I fill these sets with partial sums, while iterating from this vertex to start of tree and to finish of tree. Then I find set intersection and get number of paths from this vertex. Then I repeat algorithm for all other vertices.

There is a simpler and, I think, faster O(n^2) approach, based on the so called two pointers method.

For each vertix v go at the same time into two possible directions. Have one "pointer" to a vertex (vl) moving in one direction and another (vr) into another direction, and try to keep the distance from v to vl as close to the distance from v to vr as possible. Each time these distances become equal, you have equal paths.

for v in vertices
    vl = prev(v)
    vr = next(v)
    while (vl is still inside the tree) 
              and (vr is still inside the tree)
        if dist(v,vl) < dist(v,vr)
            vl = prev(vl)
        else if dist(v,vr) < dist(v,vl)
            vr = next(vr)
        else // dist(v,vr) == dist(v,vl)
            ans = ans + 1
            vl = prev(vl)
            vr = next(vr)

(By precalculating the prefix sums, you can find dist in O(1).)

It's easy to see that no equal pair will be missed provided that you do not have zero-length edges.

Regarding a faster solution, if you want to list all pairs, then you can't do it faster, because the number of pairs will be O(n^2) in the worst case. But if you need only the amount of these pairs, there might exist faster algorithms.

UPD: I came up with another algorithm for calculating the amount, which might be faster in case your edges are rather short. If you denote the total length of your chain (sum of all edges weight) as L, then the algorithm runs in O(L log L). However, it is much more advanced conceptually and more advanced in coding too.

Firstly some theoretical reasoning. Consider some vertex v. Let us have two arrays, a and b, not the C-style zero-indexed arrays, but arrays with indexation from -L to L.

Let us define

  • for i>0, a[i]=1 iff to the right of v on the distance exactly i there is a vertex, otherwise a[i]=0
  • for i=0, a[i]≡a[0]=1
  • for i<0, a[i]=1 iff to the left of v on the distance exactly -i there is a vertex, otherwise a[i]=0

A simple understanding of this array is as follows. Stretch your graph and lay it along the coordinate axis so that each edge has the length equal to its weight, and that vertex v lies in the origin. Then a[i]=1 iff there is a vertex at coordinate i.

For your example and for vertex "b" chosen as v:

          a--------b--c--d--e
     --|--|--|--|--|--|--|--|--|-->
      -4 -3 -2 -1  0  1  2  3  4
a: ... 0  1  0  0  1  1  1  1  0 ...  

For another array, array b, we define the values in a symmetrical way with respect to origin, as if we have inverted the direction of the axis:

  • for i>0, b[i]=1 iff to the left of v on the distance exactly i there is a vertex, otherwise b[i]=0
  • for i=0, b[i]≡b[0]=1
  • for i<0, b[i]=1 iff to the right of v on the distance exactly -i there is a vertex, otherwise b[i]=0

Now consider a third array c such that c[i]=a[i]*b[i], asterisk here stays for ordinary multiplication. Obviously c[i]=1 iff the path of length abs(i) to the left ends in a vertex, and the path of length abs(i) to the right ends in a vertex. So for i>0 each position in c that has c[i]=1 corresponds to the path you need. There are also negative positions (c[i]=1 with i<0), which just reflect the positive positions, and one more position where c[i]=1, namely position i=0.

Calculate the sum of all elements in c. This sum will be sum(c)=2P+1, where P is the total number of paths which you need with v being its center. So if you know sum(c), you can easily determine P.

Let us now consider more closely arrays a and b and how do they change when we change the vertex v. Let us denote v0 the leftmost vertex (the root of your tree) and a0 and b0 the corresponding a and b arrays for that vertex.

For arbitrary vertex v denote d=dist(v0,v). Then it is easy to see that for vertex v the arrays a and b are just arrays a0 and b0 shifted by d:

a[i]=a0[i+d]
b[i]=b0[i-d]

It is obvious if you remember the picture with the tree stretched along a coordinate axis.

Now let us consider one more array, S (one array for all vertices), and for each vertex v let us put the value of sum(c) into the S[d] element (d and c depend on v).

More precisely, let us define array S so that for each d

S[d] = sum_over_i(a0[i+d]*b0[i-d])

Once we know the S array, we can iterate over vertices and for each vertex v obtain its sum(c) simply as S[d] with d=dist(v,v0), because for each vertex v we have sum(c)=sum(a0[i+d]*b0[i-d]).

But the formula for S is very simple: S is just the convolution of the a0 and b0 sequences. (The formula does not exactly follow the definition, but is easy to modify to the exact definition form.)

So what we now need is given a0 and b0 (which we can calculate in O(L) time and space), calculate the S array. After this, we can iterate over S array and simply extract the numbers of paths from S[d]=2P+1.

Direct application of the formula above is O(L^2). However, the convolution of two sequences can be calculated in O(L log L) by applying the Fast Fourier transform algorithm. Moreover, you can apply a similar Number theoretic transform (don't know whether there is a better link) to work with integers only and avoid precision problems.

So the general outline of the algorithm becomes

calculate a0 and b0   // O(L)
calculate S = corrected_convolution(a0, b0)    // O(L log L)
v0 = leftmost vertex (root)
for v in vertices:
    d = dist(v0, v)
    ans = ans + (S[d]-1)/2

(I call it corrected_convolution because S is not exactly a convolution, but a very similar object for which a similar algorithm can be applied. Moreover, you can even define S'[2*d]=S[d]=sum(a0[i+d]*b0[i-d])=sum(a0[i]*b0[i-2*d]), and then S' is the convolution proper.)

like image 84
Petr Avatar answered Nov 07 '22 08:11

Petr