Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Path-finding efficiency in Python

I have written some code that finds all the paths upstream of a given reach in a dendritic stream network. As an example, if I represent the following network:

     4 -- 5 -- 8
    / 
   2 --- 6 - 9 -- 10
  /           \ 
 1              -- 11
  \
   3 ----7

as a set of parent-child pairs:

{(11, 9), (10, 9), (9, 6), (6, 2), (8, 5), (5, 4), (4, 2), (2, 1), (3, 1), (7, 3)}

it will return all of the paths upstream of a node, for instance:

get_paths(h, 1)  # edited, had 11 instead of 1 in before
[[Reach(2), Reach(6), Reach(9), Reach(11)], [Reach(2), Reach(6), Reach(9), Reach(10)], [Reach(2), Reach(4), Reach(5), Reach(8)], [Reach(3), Reach(7)]]

The code is included below.

My question is: I am applying this to every reach in a very large (e.g., New England) region for which any given reach may have millions of paths. There's probably no way to avoid this being a very long operation, but is there a pythonic way to perform this operation such that brand new paths aren't generated with each run?

For example, if I run get_paths(h, 2) and all paths upstream from 2 are found, can I later run get_paths(h, 1) without retracing all of the paths in 2?

import collections

# Object representing a stream reach.  Used to construct a hierarchy for accumulation function
class Reach(object):
    def __init__(self):
        self.name = None
        self.ds = None
        self.us = set()

    def __repr__(self):
        return "Reach({})".format(self.name)


def build_hierarchy(flows):
    hierarchy = collections.defaultdict(lambda: Reach())
    for reach_id, parent in flows:
        if reach_id:
            hierarchy[reach_id].name = reach_id
            hierarchy[parent].name = parent
            hierarchy[reach_id].ds = hierarchy[parent]
            hierarchy[parent].us.add(hierarchy[reach_id])
    return hierarchy

def get_paths(h, start_node):
    def go_up(n):
        if not h[n].us:
            paths.append(current_path[:])
        for us in h[n].us:
            current_path.append(us)
            go_up(us.name)
        if current_path:
            current_path.pop()
    paths = []
    current_path = []
    go_up(start_node)
    return paths

test_tree = {(11, 9), (10, 9), (9, 6), (6, 2), (8, 5), (5, 4), (4, 2), (2, 1), (3, 1), (7, 3)}
h = build_hierarchy(test_tree)
p = get_paths(h, 1)

EDIT: A few weeks ago I asked a similar question about finding "ALL" upstream reaches in a network and received an excellent answer that was very fast:

class Node(object):

    def __init__(self):
        self.name = None
        self.parent = None
        self.children = set()
        self._upstream = set()

    def __repr__(self):
        return "Node({})".format(self.name)

    @property
    def upstream(self):
        if self._upstream:
            return self._upstream
        else:
            for child in self.children:
                self._upstream.add(child)
                self._upstream |= child.upstream
            return self._upstream

import collections

edges = {(11, 9), (10, 9), (9, 6), (6, 2), (8, 5), (5, 4), (4, 2), (2, 1), (3, 1), (7, 3)}
nodes = collections.defaultdict(lambda: Node())

for node, parent in edges:
    nodes[node].name = node
    nodes[parent].name = parent
    nodes[node].parent = nodes[parent]
    nodes[parent].children.add(nodes[node])

I noticed that the def upstream(): part of this code adds upstream nodes in sequential order, but because it's an iterative function I can't find a good way to append them to a single list. Perhaps there is a way to modify this code that preserves the order.

like image 239
triphook Avatar asked Jan 28 '15 15:01

triphook


People also ask

What is the most efficient path finding algorithm?

A* pathfinding algorithm is arguably the best pathfinding algorithm when we have to find the shortest path between two nodes. A* is the golden ticket, or industry standard, that everyone uses. Dijkstra's Algorithm works well to find the shortest path, but it wastes time exploring in directions that aren't promising.

HOW DOES A * search work?

A* is an informed search algorithm, or a best-first search, meaning that it is formulated in terms of weighted graphs: starting from a specific starting node of a graph, it aims to find a path to the given goal node having the smallest cost (least distance travelled, shortest time, etc.).


2 Answers

Yes, you can do this. I'm not fully sure what your constraints are; however, this should get you on the right track. The worst case run time of this is O(|E|+|V|), with the only difference being that in p.dfsh, we are caching previously evaluated paths, as opposed to p.dfs, we are not.

This will add additional space overhead, so be aware of that tradeoff – you'll save many iterations (depending on your data set) at the expensive of more memory taken up no matter what. Unfortunately, the caching doesn't improve the order of growth, only the practical run time:

points = set([
    (11, 9),
    (10, 9), 
    (9, 6), 
    (6, 2), 
    (8, 5), 
    (5, 4), 
    (4, 2), 
    (2, 1), 
    (3, 1),
    (7, 3),
])

class PathFinder(object):

    def __init__(self, points):
        self.graph  = self._make_graph(points)
        self.hierarchy = {}

    def _make_graph(self, points):
        graph = {}
        for p in points:
            p0, p1 = p[0], p[1]
            less, more = min(p), max(p)

            if less not in graph:
                graph[less] = set([more])
            else:
                graph[less].add(more)

        return graph

    def dfs(self, start):
        visited = set()
        stack = [start]

        _count = 0
        while stack:
            _count += 1
            vertex = stack.pop()
            if vertex not in visited:
                visited.add(vertex)
                if vertex in self.graph:
                    stack.extend(v for v in self.graph[vertex])

        print "Start: {s} | Count: {c} |".format(c=_count, s=start),
        return visited

    def dfsh(self, start):
        visited = set()
        stack = [start]

        _count = 0
        while stack:
            _count += 1

            vertex = stack.pop()
            if vertex not in visited:
                if vertex in self.hierarchy:
                    visited.update(self.hierarchy[vertex])
                else:
                    visited.add(vertex)
                    if vertex in self.graph:
                        stack.extend([v for v in self.graph[vertex]])
        self.hierarchy[start] = visited

        print "Start: {s} | Count: {c} |".format(c=_count, s=start),
        return visited

p = PathFinder(points)
print p.dfsh(1)
print p.dfsh(2)
print p.dfsh(9)
print p.dfsh(6)
print p.dfsh(2)
print 
print p.dfs(1)
print p.dfs(2)
print p.dfs(9)
print p.dfs(6)
print p.dfs(2)

The output for p.dfsh this is the following:

Start: 1 | Count: 11 | set([1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11])
Start: 2 | Count: 8 | set([2, 4, 5, 6, 8, 9, 10, 11])
Start: 9 | Count: 3 | set([9, 10, 11])
Start: 6 | Count: 2 | set([9, 10, 11, 6])
Start: 2 | Count: 1 | set([2, 4, 5, 6, 8, 9, 10, 11])

The output for just the regular p.dfs is:

Start: 1 | Count: 11 | set([1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11])
Start: 2 | Count: 8 | set([2, 4, 5, 6, 8, 9, 10, 11])
Start: 9 | Count: 3 | set([9, 10, 11])
Start: 6 | Count: 4 | set([9, 10, 11, 6])
Start: 2 | Count: 8 | set([2, 4, 5, 6, 8, 9, 10, 11])

As you can see, I do a DFS, but I keep track of previous iterations, within reason. I don't want to keep track of all possible previous paths because if you're using this on a large data set, it would take up ridiculous amounts of memory.

In the output, you can see the iteration count for p.dfsh(2) go from 8 to 1. And likewise the count for p.dfsh(6) is also dropped to 2 because of the previous computation of p.dfsh(9). This is a modest runtime improvement from the standard DFS, especially on significantly large data sets.

like image 164
Friendly King Avatar answered Oct 23 '22 09:10

Friendly King


Sure, assuming you have enough memory to store all the paths from each node, you can just use a straightforward modification of the code you've received in that answer:

class Reach(object):
    def __init__(self):
        self.name = None
        self.ds = None
        self.us = set()
        self._paths = []

    def __repr__(self):
        return "Reach({})".format(self.name)

    @property
    def paths(self):
        if not self._paths:
            for child in self.us:
                if child.paths:
                    self._paths.extend([child] + path for path in child.paths)
                else:
                    self._paths.append([child])
        return self._paths

Mind you, that for some 20,000 reaches, the required memory for that approach will be in the order of gigabytes. Required memory, assuming generally balanced tree of reaches, is O(n^2), where n is the total number of reaches. That would be 4-8 GiB for 20,000 reaches depending on your system. Required time is O(1) for any node though, after the paths from h[1] have been computed.

like image 34
Kolmar Avatar answered Oct 23 '22 08:10

Kolmar