Path-finding efficiency in Python

Tags:

My question is: I am applying this to every reach in a very large (e.g., New England) region for which any given reach may have millions of paths. There's probably no way to avoid this being a very long operation, but is there a pythonic way to perform this operation such that brand new paths aren't generated with each run?

For example, if I run get_paths(h, 2) and all paths upstream from 2 are found, can I later run get_paths(h, 1) without retracing all of the paths in 2?

Click to copy

import collections

# Object representing a stream reach.  Used to construct a hierarchy for accumulation function
class Reach(object):
    def __init__(self):
        self.name = None
        self.ds = None
        self.us = set()

    def __repr__(self):
        return "Reach({})".format(self.name)


def build_hierarchy(flows):
    hierarchy = collections.defaultdict(lambda: Reach())
    for reach_id, parent in flows:
        if reach_id:
            hierarchy[reach_id].name = reach_id
            hierarchy[parent].name = parent
            hierarchy[reach_id].ds = hierarchy[parent]
            hierarchy[parent].us.add(hierarchy[reach_id])
    return hierarchy

def get_paths(h, start_node):
    def go_up(n):
        if not h[n].us:
            paths.append(current_path[:])
        for us in h[n].us:
            current_path.append(us)
            go_up(us.name)
        if current_path:
            current_path.pop()
    paths = []
    current_path = []
    go_up(start_node)
    return paths

test_tree = {(11, 9), (10, 9), (9, 6), (6, 2), (8, 5), (5, 4), (4, 2), (2, 1), (3, 1), (7, 3)}
h = build_hierarchy(test_tree)
p = get_paths(h, 1)

EDIT: A few weeks ago I asked a similar question about finding "ALL" upstream reaches in a network and received an excellent answer that was very fast:

Click to copy

class Node(object):

    def __init__(self):
        self.name = None
        self.parent = None
        self.children = set()
        self._upstream = set()

    def __repr__(self):
        return "Node({})".format(self.name)

    @property
    def upstream(self):
        if self._upstream:
            return self._upstream
        else:
            for child in self.children:
                self._upstream.add(child)
                self._upstream |= child.upstream
            return self._upstream

import collections

edges = {(11, 9), (10, 9), (9, 6), (6, 2), (8, 5), (5, 4), (4, 2), (2, 1), (3, 1), (7, 3)}
nodes = collections.defaultdict(lambda: Node())

for node, parent in edges:
    nodes[node].name = node
    nodes[parent].name = parent
    nodes[node].parent = nodes[parent]
    nodes[parent].children.add(nodes[node])

I noticed that the def upstream(): part of this code adds upstream nodes in sequential order, but because it's an iterative function I can't find a good way to append them to a single list. Perhaps there is a way to modify this code that preserves the order.

239

asked Jan 28 '15 15:01

triphook

2 Answers

Yes, you can do this. I'm not fully sure what your constraints are; however, this should get you on the right track. The worst case run time of this is O(|E|+|V|), with the only difference being that in p.dfsh, we are caching previously evaluated paths, as opposed to p.dfs, we are not.

This will add additional space overhead, so be aware of that tradeoff – you'll save many iterations (depending on your data set) at the expensive of more memory taken up no matter what. Unfortunately, the caching doesn't improve the order of growth, only the practical run time:

Click to copy

points = set([
    (11, 9),
    (10, 9), 
    (9, 6), 
    (6, 2), 
    (8, 5), 
    (5, 4), 
    (4, 2), 
    (2, 1), 
    (3, 1),
    (7, 3),
])

class PathFinder(object):

    def __init__(self, points):
        self.graph  = self._make_graph(points)
        self.hierarchy = {}

    def _make_graph(self, points):
        graph = {}
        for p in points:
            p0, p1 = p[0], p[1]
            less, more = min(p), max(p)

            if less not in graph:
                graph[less] = set([more])
            else:
                graph[less].add(more)

        return graph

    def dfs(self, start):
        visited = set()
        stack = [start]

        _count = 0
        while stack:
            _count += 1
            vertex = stack.pop()
            if vertex not in visited:
                visited.add(vertex)
                if vertex in self.graph:
                    stack.extend(v for v in self.graph[vertex])

        print "Start: {s} | Count: {c} |".format(c=_count, s=start),
        return visited

    def dfsh(self, start):
        visited = set()
        stack = [start]

        _count = 0
        while stack:
            _count += 1

            vertex = stack.pop()
            if vertex not in visited:
                if vertex in self.hierarchy:
                    visited.update(self.hierarchy[vertex])
                else:
                    visited.add(vertex)
                    if vertex in self.graph:
                        stack.extend([v for v in self.graph[vertex]])
        self.hierarchy[start] = visited

        print "Start: {s} | Count: {c} |".format(c=_count, s=start),
        return visited

p = PathFinder(points)
print p.dfsh(1)
print p.dfsh(2)
print p.dfsh(9)
print p.dfsh(6)
print p.dfsh(2)
print 
print p.dfs(1)
print p.dfs(2)
print p.dfs(9)
print p.dfs(6)
print p.dfs(2)

The output for p.dfsh this is the following:

Click to copy

Start: 1 | Count: 11 | set([1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11])
Start: 2 | Count: 8 | set([2, 4, 5, 6, 8, 9, 10, 11])
Start: 9 | Count: 3 | set([9, 10, 11])
Start: 6 | Count: 2 | set([9, 10, 11, 6])
Start: 2 | Count: 1 | set([2, 4, 5, 6, 8, 9, 10, 11])

The output for just the regular p.dfs is:

Click to copy

Start: 1 | Count: 11 | set([1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11])
Start: 2 | Count: 8 | set([2, 4, 5, 6, 8, 9, 10, 11])
Start: 9 | Count: 3 | set([9, 10, 11])
Start: 6 | Count: 4 | set([9, 10, 11, 6])
Start: 2 | Count: 8 | set([2, 4, 5, 6, 8, 9, 10, 11])

As you can see, I do a DFS, but I keep track of previous iterations, within reason. I don't want to keep track of all possible previous paths because if you're using this on a large data set, it would take up ridiculous amounts of memory.

In the output, you can see the iteration count for p.dfsh(2) go from 8 to 1. And likewise the count for p.dfsh(6) is also dropped to 2 because of the previous computation of p.dfsh(9). This is a modest runtime improvement from the standard DFS, especially on significantly large data sets.

164

answered Oct 23 '22 09:10

Friendly King

Sure, assuming you have enough memory to store all the paths from each node, you can just use a straightforward modification of the code you've received in that answer:

Click to copy

class Reach(object):
    def __init__(self):
        self.name = None
        self.ds = None
        self.us = set()
        self._paths = []

    def __repr__(self):
        return "Reach({})".format(self.name)

    @property
    def paths(self):
        if not self._paths:
            for child in self.us:
                if child.paths:
                    self._paths.extend([child] + path for path in child.paths)
                else:
                    self._paths.append([child])
        return self._paths

Mind you, that for some 20,000 reaches, the required memory for that approach will be in the order of gigabytes. Required memory, assuming generally balanced tree of reaches, is O(n^2), where n is the total number of reaches. That would be 4-8 GiB for 20,000 reaches depending on your system. Required time is O(1) for any node though, after the paths from h[1] have been computed.

answered Oct 23 '22 08:10

Kolmar

Related questions
                            
                                What is the history behind capitalization of None, True and False in Python? [duplicate]
                            
                                AttributeError: 'Sheet' object has no attribute 'write'
                            
                                Django ORM, Insert None datetime as 0 into MySQL
                            
                                Find all indices of maximum in Pandas DataFrame
                            
                                Efficient way to store millions of arrays, and perform IN check
                            
                                Python: multiple assignment vs. individual assignment speed
                            
                                Discrete slider in matplotlib widget
                            
                                Scraping with __doPostBack with link url hidden
                            
                                Autodoc works locally, but does not on the ReadTheDocs
                            
                                matplotlib: put legend symbols on the right of the labels
                            
                                running install_lib warning: install_lib: 'build\lib' does not exist -- no Python modules to install
                            
                                Python (nltk) - UnicodeDecodeError: 'ascii' codec can't decode byte
                            
                                Pull large amounts of data from a remote server, into a DataFrame
                            
                                Matplotlib on Windows - dependency unresolved
                            
                                Explanation of pysftp.Connection.walktree() parameters
                            
                                Again urllib.error.HTTPError: HTTP Error 400: Bad Request
                            
                                Applications in subfolder in 1.7
                            
                                Python - When can you pass a positional argument by name, and when can't you?
                            
                                sckit-learn fit() leads to error after normalising the data
                            
                                How to capture network traffic using selenium webdriver and browsermob proxy on Python?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Path-finding efficiency in Python

Tags:

python

iteration

triphook

People also ask

2 Answers

Friendly King

Kolmar

Recent Activity

Donate For Us