Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Memory utilization in recursive vs an iterative graph traversal

I have looked at some common tools like Heapy to measure how much memory is being utilized by each traversal technique but I don't know if they are giving me the right results. Here is some code to give the context.

The code simply measures the number of unique nodes in a graph. Two traversal techniques provided viz. count_bfs and count_dfs

import sys
from guppy import hpy
class Graph:
    def __init__(self, key):
        self.key = key       #unique id for a vertex 
        self.connections = []
        self.visited = False 

def count_bfs(start):
    parents = [start]
    children = []
    count = 0
    while parents:
        for ind in parents:
            if not ind.visited:
                count += 1
                ind.visited = True
                for child in ind.connections:
                        children.append(child)

        parents = children
        children = []
    return count

def count_dfs(start):
    if not start.visited:
          start.visited = True
    else:
          return 0

    n = 1
    for connection in start.connections:
        n += count_dfs(connection)
    return n


def construct(file, s=1): 
    """Constructs a Graph using the adjacency matrix given in the file

    :param file: path to the file with the matrix
    :param s: starting node key. Defaults to 1

    :return start vertex of the graph
    """
    d = {}
    f = open(file,'rU')
    size = int(f.readline())
    for x in xrange(1,size+1):
        d[x] = Graph(x)
    start = d[s]
    for i in xrange(0,size):
           l = map(lambda x: int(x), f.readline().split())
           node = l[0]
           for child in l[1:]:
               d[node].connections.append(d[child])
    return start


if __name__ == "__main__":
    s = construct(sys.argv[1])
    #h = hpy()
    print(count_bfs(s))
    #print h.heap()
    s = construct(sys.argv[1])
    #h = hpy()
    print(count_dfs(s))
    #print h.heap()

I want to know by what factor is the total memory utilization different in the two traversal techniques viz. count_dfs and count_bfs? One might have the intuition that dfs may be expensive as a new stack is created for every function call. How can the total memory allocations in each traversal technique be measured?
Do the (commented) hpy statements give the desired measure?

Sample file with connections:

4
1 2 3
2 1 3
3 4 
4
like image 824
ajmartin Avatar asked Mar 09 '13 02:03

ajmartin


1 Answers

This being a python question, it may be more important how much stack space is used than how much total memory. Cpython has a low limit of 1000 frames because it shares its call stack with the c call stack, which in turn is limited to the order of one megabyte in most places. For this reason you should almost* always prefer iterative solutions to recursive ones when the recursion depth is unbounded.

* other implementations of python may not have this restriction. The stackless variants of cpython and pypy have this exact property

like image 190
SingleNegationElimination Avatar answered Sep 26 '22 18:09

SingleNegationElimination