Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pickling a graph with cycles

I have a custom node class in python that is built into a graph (which is a dictionary). Since these take a while to create, I'd like to pickle them so that I don't have to reconstruct them everytime I run my code.

Unfortunately, because this graph has cycles, cPickle hits the maximum recursion depth:

RuntimeError: maximum recursion depth exceeded while pickling an object

This is my node object:

class Node:
    def __init__(self, name):
        self.name = name
        self.uid = 0
        self.parents = set()
        self.children = set()

    def __hash__(self):
        return hash(self.name)

    def __eq__(self, that):
        return self.name == that.name

    def __str__(self):
        return "\n".join(["Name: " + self.name,
                          "\tChildren:" + ", ".join([c.name for c in self.children]),
                          "\tParents:" + ", ".join([p.name for p in self.parents])
                          ]
                         )

This is how I build my graph:

def buildGraph(input):
    graph = {}
    idToNode = {}

    for line in input:
        ## Input from text line by line looks like
        ## source.node -> target.node
        source, arr, target = line.split()
        if source in graph:
            nsource = graph[source]
        else:
            nsource = Node(source)
            nsource.uid = len(graph)
            graph[source] = nsource
            idToNode[nsource.uid] = nsource

        if target in graph:
            ntarget = graph[target]
        else:
            ntarget = Node(target)
            ntarget.uid = len(graph)
            graph[target] = ntarget
            idToNode[ntarget.uid] = ntarget

        nsource.children.add(ntarget)
        ntarget.parents.add(nsource)
    return graph

Then in my main, I have

    graph = buildGraph(input_file)
    bo = cPickle.dumps(graph)

and the second line is where I get my recursion depth error.

Are there any solutions outside of changing the structure of Node?

like image 734
JasonMond Avatar asked Feb 22 '12 18:02

JasonMond


3 Answers

You need to prepare the object for pickle: if you have a cycle you need to break cycles and store this information in some other form.

Pickle use methods __getstate__ to prepare object to pickle (it call before) and __setstate__ to initialize object.

class SomethingPickled(object):
    ## Compress and uncycle data before pickle.
    def __getstate__(self):
        # deep copy object
        state = self.__dict__.copy()
        # break cycles
        state['uncycled'] = self.yourUncycleMethod(state['cycled'])
        del state['cycle']
        # send to pickle
        return state

    ## Expand data before unpickle.
    def __setstate__(self, state):
        # restore cycles
        state['cycle'] = self.yourCycleMethod(state['uncycled'])
        del state['uncycle']
        self.__dict__.update(state)

I am sure than you will find idea how to break and join cycles :)

like image 77
Chameleon Avatar answered Nov 17 '22 05:11

Chameleon


I don't think that the fact that your graph is cyclic is the problem -- pickle (and cPickle) should handle cyclic data structures just fine. I tried the following (with your definition of Node) and it worked just fine:

>>> n1 = Node('a')
>>> n2 = Node('b')
>>> n1.parents.add(n2)
>>> n2.parents.add(n1)
>>> n2.children.add(n1)
>>> n1.children.add(n1)

>>> import cPickle as pickle
>>> pickle.dumps(n1)

Indeed, even with large cycles I didn't run into a problem. E.g., this works fine for me:

>>> def node_cycle(n):
...     start_node = prev_node = Node('node0')
...     for i in range(n):
...         node = Node('node%d' % (i+1))
...         node.parents.add(prev_node)
...         prev_node.children.add(node)
...         prev_node = node
...     start_node.parents.add(node)
...     node.children.add(start_node)

>>> cycle = node_cycle(100000) # cycle of 100k nodes
>>> pickle.dumps(cycle)

(This was all tested on Python 2.7.1)

There are other reasons why pickle might end up with very deep recursion though, depending on the shape of your data structure. If this is the real problem, then you might be able to fix it with something like this:

>>> import sys
>>> sys.setrecursionlimit(10000)
like image 44
Edward Loper Avatar answered Nov 17 '22 04:11

Edward Loper


Here, this modified node class holds only the names of the objects as strings in a node, and gives you a set with full "Node" objects when you retrieve either the "children" or the "parents" attribute of a node.

Internally there are no cycles - so it should avoid the infinity loop trap.You can implement additional auxiliar methods to ease navigation as you want.

class Node(object):
    all_nodes = {}
    def __new__(cls, name):
        self = object.__new__(cls)
        cls.all_nodes[name] = self
        return self

    def __getstate__(self):
        self.all_nodes = self.__class__.all_nodes
        return self.__dict__

    def __setstate__(self, dct):
        self.__class__.all_nodes = dct["all_nodes"]
        del dct["all_nodes"]
        self.__dict__ = dct

    def __init__(self, name):
        #self.all_nodes = self.__class__.all_nodes
        self.name = name
        self.uid = 0
        self._parents = set()
        self._children = set()

    def __hash__(self):
        return hash(self.name)

    def __eq__(self, that):
        return self.name == that.name

    def __repr__(self):
        return "\n" + "\n".join(["Name: " + self.name,
                          "\tChildren:" + ", ".join([c.name for c in self.children]),
                          "\tParents:" + ", ".join([p.name for p in self.parents])
                          ]
                         )
    def get_relations(self, which):
        names = getattr(self, which)
        return set(self.__class__.all_nodes[name] for name in names)
    @property
    def children(self):
        return self.get_relations("_children")
    @property
    def parents(self):
        return self.get_relations("_parents")

    def __contains__(self, item):
        return item.name in self._children

    def add(self, child):
        self._children.add(child.name)
        child._parents.add(self.name)
    connect_child = add



#example and testing:

from cPickle import loads, dumps

n1 = Node("n1")
n2 = Node("n2")
n3 = Node("n3")

n1.add(n2)
n2.add(n3)
n3.add(n1)

print n1, n2, n3


p1 = dumps(n1)
Node.all_nodes.clear()
p2 = loads(p1)

print p2
print p2.children
print p2.children.pop().children

print Node.all_nodes

The drawback is that it maintains a class dictionary named "all_nodes" where there are references to all actually created nodes. (Pickle is smart enough to only pickle this dictionary once for a given graph, since it is referenced by all Node objects) . The problem with the class wide "all_nodes" reference is if you need to pickle and unpickle different sets of graphs 9let say you do create graphs g1 with a set of nodes, in another run, create a graph g2 with another set of nodes, and then if you unpickle g1, and later g2, the unpickling of g2 will override the node references for g1). If you need this to work, ask in a comment and I could come up with something - the easiser thing I can think off is having a "graph" class that will hold a dictionary to all the nodes (insteadof having it in the Node class)

like image 1
jsbueno Avatar answered Nov 17 '22 04:11

jsbueno