I have a custom node class in python that is built into a graph (which is a dictionary). Since these take a while to create, I'd like to pickle them so that I don't have to reconstruct them everytime I run my code.
Unfortunately, because this graph has cycles, cPickle hits the maximum recursion depth:
RuntimeError: maximum recursion depth exceeded while pickling an object
This is my node object:
class Node:
def __init__(self, name):
self.name = name
self.uid = 0
self.parents = set()
self.children = set()
def __hash__(self):
return hash(self.name)
def __eq__(self, that):
return self.name == that.name
def __str__(self):
return "\n".join(["Name: " + self.name,
"\tChildren:" + ", ".join([c.name for c in self.children]),
"\tParents:" + ", ".join([p.name for p in self.parents])
]
)
This is how I build my graph:
def buildGraph(input):
graph = {}
idToNode = {}
for line in input:
## Input from text line by line looks like
## source.node -> target.node
source, arr, target = line.split()
if source in graph:
nsource = graph[source]
else:
nsource = Node(source)
nsource.uid = len(graph)
graph[source] = nsource
idToNode[nsource.uid] = nsource
if target in graph:
ntarget = graph[target]
else:
ntarget = Node(target)
ntarget.uid = len(graph)
graph[target] = ntarget
idToNode[ntarget.uid] = ntarget
nsource.children.add(ntarget)
ntarget.parents.add(nsource)
return graph
Then in my main, I have
graph = buildGraph(input_file)
bo = cPickle.dumps(graph)
and the second line is where I get my recursion depth error.
Are there any solutions outside of changing the structure of Node?
You need to prepare the object for pickle: if you have a cycle you need to break cycles and store this information in some other form.
Pickle use methods __getstate__
to prepare object to pickle (it call before) and __setstate__
to initialize object.
class SomethingPickled(object):
## Compress and uncycle data before pickle.
def __getstate__(self):
# deep copy object
state = self.__dict__.copy()
# break cycles
state['uncycled'] = self.yourUncycleMethod(state['cycled'])
del state['cycle']
# send to pickle
return state
## Expand data before unpickle.
def __setstate__(self, state):
# restore cycles
state['cycle'] = self.yourCycleMethod(state['uncycled'])
del state['uncycle']
self.__dict__.update(state)
I am sure than you will find idea how to break and join cycles :)
I don't think that the fact that your graph is cyclic is the problem -- pickle (and cPickle) should handle cyclic data structures just fine. I tried the following (with your definition of Node) and it worked just fine:
>>> n1 = Node('a')
>>> n2 = Node('b')
>>> n1.parents.add(n2)
>>> n2.parents.add(n1)
>>> n2.children.add(n1)
>>> n1.children.add(n1)
>>> import cPickle as pickle
>>> pickle.dumps(n1)
Indeed, even with large cycles I didn't run into a problem. E.g., this works fine for me:
>>> def node_cycle(n):
... start_node = prev_node = Node('node0')
... for i in range(n):
... node = Node('node%d' % (i+1))
... node.parents.add(prev_node)
... prev_node.children.add(node)
... prev_node = node
... start_node.parents.add(node)
... node.children.add(start_node)
>>> cycle = node_cycle(100000) # cycle of 100k nodes
>>> pickle.dumps(cycle)
(This was all tested on Python 2.7.1)
There are other reasons why pickle might end up with very deep recursion though, depending on the shape of your data structure. If this is the real problem, then you might be able to fix it with something like this:
>>> import sys
>>> sys.setrecursionlimit(10000)
Here, this modified node class holds only the names of the objects as strings in a node, and gives you a set with full "Node" objects when you retrieve either the "children" or the "parents" attribute of a node.
Internally there are no cycles - so it should avoid the infinity loop trap.You can implement additional auxiliar methods to ease navigation as you want.
class Node(object):
all_nodes = {}
def __new__(cls, name):
self = object.__new__(cls)
cls.all_nodes[name] = self
return self
def __getstate__(self):
self.all_nodes = self.__class__.all_nodes
return self.__dict__
def __setstate__(self, dct):
self.__class__.all_nodes = dct["all_nodes"]
del dct["all_nodes"]
self.__dict__ = dct
def __init__(self, name):
#self.all_nodes = self.__class__.all_nodes
self.name = name
self.uid = 0
self._parents = set()
self._children = set()
def __hash__(self):
return hash(self.name)
def __eq__(self, that):
return self.name == that.name
def __repr__(self):
return "\n" + "\n".join(["Name: " + self.name,
"\tChildren:" + ", ".join([c.name for c in self.children]),
"\tParents:" + ", ".join([p.name for p in self.parents])
]
)
def get_relations(self, which):
names = getattr(self, which)
return set(self.__class__.all_nodes[name] for name in names)
@property
def children(self):
return self.get_relations("_children")
@property
def parents(self):
return self.get_relations("_parents")
def __contains__(self, item):
return item.name in self._children
def add(self, child):
self._children.add(child.name)
child._parents.add(self.name)
connect_child = add
#example and testing:
from cPickle import loads, dumps
n1 = Node("n1")
n2 = Node("n2")
n3 = Node("n3")
n1.add(n2)
n2.add(n3)
n3.add(n1)
print n1, n2, n3
p1 = dumps(n1)
Node.all_nodes.clear()
p2 = loads(p1)
print p2
print p2.children
print p2.children.pop().children
print Node.all_nodes
The drawback is that it maintains a class dictionary named "all_nodes" where there are references to all actually created nodes. (Pickle is smart enough to only pickle this dictionary once for a given graph, since it is referenced by all Node objects) . The problem with the class wide "all_nodes" reference is if you need to pickle and unpickle different sets of graphs 9let say you do create graphs g1 with a set of nodes, in another run, create a graph g2 with another set of nodes, and then if you unpickle g1, and later g2, the unpickling of g2 will override the node references for g1). If you need this to work, ask in a comment and I could come up with something - the easiser thing I can think off is having a "graph" class that will hold a dictionary to all the nodes (insteadof having it in the Node class)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With