I use Cython to wrap C++ code and expose it to Python for interactive work. My problem is that I need to read large graphs (several gigabytes) from file and they end up twice in the memory. Can anyone help me diagnose and solve this problem?
My Cython wrapper for the graph class looks like this:
cdef extern from "../src/graph/Graph.h":
cdef cppclass _Graph "Graph":
_Graph() except +
_Graph(count) except +
count numberOfNodes() except +
count numberOfEdges() except +
cdef class Graph:
"""An undirected, optionally weighted graph"""
cdef _Graph _this
def __cinit__(self, n=None):
if n is not None:
self._this = _Graph(n)
# any _thisect which appears as a return type needs to implement setThis
cdef setThis(self, _Graph other):
#del self._this
self._this = other
return self
def numberOfNodes(self):
return self._this.numberOfNodes()
def numberOfEdges(self):
return self._this.numberOfEdges()
If a Python Graph needs to be returned, it needs to be created empty and then the setThis
method is used to set the native _Graph
instance. This happens, for example, when a Graph
is read from file. This is the job of this class:
cdef extern from "../src/io/METISGraphReader.h":
cdef cppclass _METISGraphReader "METISGraphReader":
_METISGraphReader() except +
_Graph read(string path) except +
cdef class METISGraphReader:
""" Reads the METIS adjacency file format [1]
[1]: http://people.sc.fsu.edu/~jburkardt/data/metis_graph/metis_graph.html
"""
cdef _METISGraphReader _this
def read(self, path):
pathbytes = path.encode("utf-8") # string needs to be converted to bytes, which are coerced to std::string
return Graph(0).setThis(self._this.read(pathbytes))
Interactive usage looks like this:
>>> G = graphio.METISGraphReader().read("giant.metis.graph")
After the reading from file is done and X GB memory are used, there is a phase where obviously copying happens, and after that 2X GB memory are used. The entire memory is freed when del G
is called.
Where is my error which leads to the graph being copied and existing twice in memory?
I don't have a definitive answer for you, but I have a theory.
The Cython wrappers that you wrote are unusual, in that they wrap the C++ object directly instead of a pointer to it.
The following code is particularly inefficient:
cdef setThis(self, _Graph other):
self._this = other
return self
The reason is that your _Graph
class contains several STL vectors, and those will have to be copied over. So, when your other
object is assigned to self._this
the memory usage is effectively doubled (or worse, since the STL allocators can overallocate for performance reasons).
I wrote a simple test that matches yours and added logging everywhere to see how objects are created, copied or destroyed. I can't find any issues there. The copies do happen, but after the assignment is complete I see that only one object remains.
So my theory is that the extra memory that you see is related to STL allocator logic in the vectors. All that extra memory must be attached to the final object after the copies.
My recommendation is that you switch to the more standard pointer based wrapping. Your _Graph
wrapper then should be defined more or less as follows:
cdef class Graph:
"""An undirected, optionally weighted graph"""
cdef _Graph* _this
def __cinit__(self, n=None):
if n is not None:
self._this = new _Graph(n)
else:
self._this = 0
cdef setThis(self, _Graph* other):
del self._this
self._this = other
return self
def __dealloc__(self):
del self._this
Note that I need to delete _this
because it is a pointer.
You will then need to modify your METISGraphReader::read()
method to return a heap allocated Graph
. The prototype of this method should be changed to:
Graph* METISGraphReader::read(std::string path);
Then the Cython wrapper for it can be written as:
def read(self, path):
pathbytes = path.encode("utf-8") # string needs to be converted to bytes, which are coerced to std::string
return Graph().setThis(self._this.read(pathbytes))
If you do it this way there is only one object, the one that is created on the heap by read()
. A pointer to that object is returned to the read()
Cython wrapper, which then installs it in a brand new Graph()
instance. The only thing that gets copied is the 4 or 8 bytes of the pointer.
I hope this helps!
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With