Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can one efficiently process shortest path queries in a successor graph?

I'm trying to solve this question: https://cses.fi/problemset/task/1160/

Basically, the question requires the coder to process Q shortest path queries in a successor graph of N nodes, where Q and N are numbers up to 100000. I've been stuck on this problem for days, and none of my ideas are fast enough to past all the test cases.

My first idea was to use some sort of all-pairs shortest path algorithm, such as the Floyd-Warshall algorithm. However, this algorithm would run in O(N^3 + Q) time, which is way too slow for this problem.

My second idea was to process each query individually using a breadth-first search. This algorithm would run in O(Q*N) time, which is faster than my first idea but still too slow for the problem.

I have tried searching online for a solution, but haven't found any with an explanation. Can somebody point me in the right direction?

like image 740
Anthony Smith Avatar asked Jul 30 '20 16:07

Anthony Smith


People also ask

How do you solve the shortest path problem?

Algorithms. The most important algorithms for solving this problem are: Dijkstra's algorithm solves the single-source shortest path problem with non-negative edge weight. Bellman–Ford algorithm solves the single-source problem if edge weights may be negative.

Which of the following types of problem on graphs finds out the shortest path in a graph that visits each vertex at least once starting and ending at the same vertex?

(b) Shortest pathHamilton path problem is a special case of our problem; then, we have the following theorem straightforwardly. Theorem 2. The problem of finding the shortest path with vertex constraint over graphs is NP-hard.

Which algorithm will you use to determine the shortest path between the nodes in a graph?

Dijkstra's algorithm can be used to determine the shortest path from one node in a graph to every other node within the same graph data structure, provided that the nodes are reachable from the starting node. Dijkstra's algorithm can be used to find the shortest path.


Video Answer


2 Answers

Start off by finding all nodes with more than one or exactly zero edges entering. This can be done in O(N) and shouldn't produce too many nodes considering that there's exactly one outgoing edge per node.

All nodes have a specific property with respect to this set of nodes (I'll call that set S from now on). We can differentiate between three cases:

  • there's no connection at all between two nodes
  • there's a connection involving no node of S. In this case finding the shortest path is trivial, we have essentially a linkedlist structure between these two nodes
  • there's at least one node of S between them. In this case, there's also exactly one cycle-free path between these two nodes, which traverses one or multiple nodes from S.

Building a lookup-table for the second case is simple. Perform a BFS from each node n in S that terminates once only other nodes in S can be reached and store the distance for each node. Now the distance between two nodes matching the second condition is the absolute difference between the distances of two nodes to n. I'll call this table direct_out. Lookups can be performed by direct_out[n][a], where a is an arbitrary node. In this step we also build a second lookup-table direct_in that provides the same data, but lists the next node from S that can be reached from a node.

This leaves us with a smaller weighted graph. There's still a few things left to optimize though. First of all: cycles in this graph never contain any "exit". This means that any path that enters a cycle ends in that cycle; or to put it another way: each component of the graph contains at most one cycle. This is quite a central since it allows us to build a lookup-table for to precalculate the distance between two nodes in S in linear time. First of all generate a list of all cycles in the graph (we're interested in a set of nodes for each cycle). Now iteratively create a lookup-table by starting from nodes in S with zero entering edges and continue until a cycle is reached. Perform this for all nodes with zero entries. Now traverse each cycle twice and update the lookup-tables accordingly to complete the lookup tables.

Generating the lookup-tables will take O(N^2) worst-case (should be considerably faster in practice) and distance lookups can be performed in O(1). In pseudo-code the solution would look like this:

def build_lookup(t):
    no_entry = {0 ... N} \ set(t)
    multiple_entries = set(duplicates(t))
    S = no_entry + multiple_entries

    node_to_preceding_S = table()     # preceding node in S for each node
    nodes_con_S = table(default=table)  # distance to preceding node in S for each node
    S_to_S = table(default=table())                  # distance from node in S to next node in S
    S_to_S_rev = table(default=list())

    # build lookup-tables for all connections between two nodes in S (inclusive)
    for n in S:
        dist = 0
        lookup = table()
        c = n

        do
            node_to_preceding_S[c] = n
            lookup[c] = dist
            c = t[c]
            dist += 1
        while c not in S

        nodes_con_S[n] = lookup
        S_to_S[n] = c, dist
        S_to_S_rev[c].append(n)

    # generate list of cycles
    pred = table()
    cycles = set()
    all_cycle_nodes = set()
    for n in no_entry:
        pred = null
        visited = set()
        while n not in visited() and n not in all_cycle_nodes:
            visited.add(n)
            pred[S_to_S[n][0]] = n
            n = S_to_S[n][0]
        
        if n not in all_cycle_nodes:
            # found a cycle => backtrack
            c = n
            cycle = set()
            do
                cycle.add(c)
                all_cycle_nodes.add(c)
                c = pred[c]
            while c != n
            cycles.add(cycle)

    # generate topological ordering of nodes ignoring cycles
    nodes = list()
    incoming = table(default=0)

    # lookup-tables for count of edges pointing to each node (ignoring cycle-edges)
    # this table only contains nodes that aren't only endpoints of edges contained in cycles
    for s, c in enumerate(t):
        if s not in all_cycle_nodes:
            incoming[c] += 1

    # topological sorting
    q = queue(incoming.keys())  
    while q is not empty:
        n = q.dequeue()
        if incoming[n] != 0:
            q.enqueue(n)
            continue

        nodes.append(n)
        incoming[t[n]] -= 1

    # generate lookup-tables for nodes in S
    S_dist_lookup = table(default=table())
    for n in nodes:
        S_dist_lookup[n][n] = 0
        c, d = S_to_S[n]
        S_dist_lookup[c] += S_dist_lookup[n].increment_all_values(d)

    # process members of cycles            
    visited = table(default=0)
    for n in all_cycle_nodes:
        if visited[n] == 2
            continue
        cum_table = table()
        
        do
            visited[n] += 1
            c, d = S_to_S[n]
            if visited[n] == 2:
                S_dist_lookup[n] = cum_table
            else:
                cum_table.increment_all_values(d)
                cum_table.update_keep_min(S_dist_lookup[n])
            n = c
        while visited[n] != 3

    return node_to_preceding_S, nodes_con_S, S_dist_lookup

Lookups can then be performed this way:

def lookup(a, b):
    s_a = nodes_to_preceding_S[a]
    s_b = nodes_to_preceding_S[b]
    
    # case 1
    if s_a not in S_dist_lookup[s_b].keys()
        return -1

    # case 2
    if s_a == s_b
        if nodes_con_S[s_a][b] < nodes_con_S[s_a][a]
            return -1
        return nodes_con_S[s_a][b] - nodes_con_S[s_a][a]

    # case 3
    d_S = S_dist_lookup[s_b][s_a]
    d_b = nodes_con_S[s_b][b]
    if a == s_a:
        return d_S + d_b
    else:
        d_a = S_dist_lookup[s_b][s_a]
        return d_a + d_S + d_b

EDIT:
As promised here's an attempt at an explanation with a diagram: example graph

In the above graph blue nodes represent members of S. The complete graph may have a different structure, but this one represent all relevant possibilities (in fact the {12,13} subgraph is just here to show how multiple components are handled). Now the lookup-tables would have the following structure:

node_to_preceding_S = {
    1: 1, 2: 1, 3: 1,
    5: 4, 6: 4, 7: 4,
    10: 10, 11: 10, 8: 10,
    9: 8, 4: 8,
    12: 13, 13: 12
}

Note that for this table lookup from members of S to other members of S aren't deterministic, since there may be multiple "preceding" members of S and any might be put into the table.

nodes_con_S = {
    1: {1: 0, 2: 1, 3: 2, 4: 3},
    4: {4: 0, 5: 1, 6: 2, 7: 3, 8: 4},
    8: {8: 0, 9: 1, 4: 2},
    10: {10: 0, 11: 1, 8: 2},
    12: {12: 0, 13: 1},
    13: {13: 0, 12: 1}
}

S_dist_lookup = {
    1: {1: 0},
    4: {4: 0, 1: 3},
    8: {8: 0, 4: 4, 1: 7},
    10: {10: 0},
    12: {12: 0, 13: 1},
    13: {13: 0, 12: 1}
}

The basic idea is to create a subgraph of the nodes in S (the blue nodes in the diagram) and the group all nodes between two adjacent blue nodes. The distance for each node to the preceding blue node can be calculated fairly easily, covering case 2. For case 1 it's sufficient to know whether there's a path from the preceding blue node of a to the preceding blue node of b. This can be easily covered by the data stored for case 3. Case 3 is the trickiest, since it also involves finding a cycle (if present) and dealing with rather complex structures. Outside of the cycle that may be contained within each component, any tree may be encountered. These trees can be handled using topological ordering. Examples for such edges in the graph are 1 -> 4 and 10 -> 8. Within the cycle, all connections can be handled by traversing the cycle twice. This way the pairwise connection for each pair of nodes will be handled.

like image 197
Paul Avatar answered Oct 19 '22 23:10

Paul


We can solve this in O(V + Q) time, where V is the number of planets and Q the number of queries. This is a very special directed graph, where every vertex has only one edge directed away from it. It's basically a single dimension line with n points, some of which have us jump ahead or back on that same line. Since there is only one way out of a planet, there is also only one way to travel from one planet to another. A way that's not minimal would mean cycling more than once over this one path.

Iterate on the planets one at a time, following the single path they lead us on. Lets call each one of these starting planets p. Record every planet visited, v, and associate it with p and its distance from p (the number of teleportations it took to get from p to it). Clearly, we don't need to revisit any of them if we store appropriate information related to p and our queries.

As we folllow the path starting at p, if we encounter a planet, s, that's a starting planet in a query, or a planet, d, that's a destination, record the distance from p to it. Note that if a destination for s is reachable, it must be on this path iteration. One of these scenarios must occur: (1) we reach some or all of the destinations associated with s and for each one output the distance from p to it subtracted by the distance from p to s; (2) we do not reach some or all of the destinations associated with s in this path iteration and output -1 for each of those; or (3) we reach a destination for which we have not seen an associated s in this iteration that started on p.

The iteration on p must reach a cycle, back to p or to some node in its path.

(3) means there could be another iteration, on starting planet p', that we haven't yet visited that will contain the s we need. This is the scenario where we use the information we stored about p or a planet in its path. If we encounter p or a planet v recorded in its path (who's distance from p is less than from p to d) during our iteration on the path that starts at p', for each s we encountered during this iteration starting on p', output the distance from p' to p subtracted by the distance from p' to s added to the distance from p to the destination; or the distance from p' to v subtracted by the distance from p' to s added to the distance from p to d subtracted by the distance from p to v; if the record for p contains that destination; otherwise, output -1.

To avoid having to look through irrelevant origin planets when considering a destination found in ps path, only perform that lookup for destinations associated with origins already seen in ps path. Otherwise, just record the destination's association with and distance from p.

like image 42
גלעד ברקן Avatar answered Oct 20 '22 01:10

גלעד ברקן