I am using Scipy's KDTree implementation to read a large file of 300 MB. Now, is there a way I can just save the datastructure to disk and load it again or am I stuck with reading raw points from file and constructing the data structure each time I start my program? I am constructing the KDTree as follows:
def buildKDTree(self):
self.kdpoints = numpy.fromfile("All", sep=' ')
self.kdpoints.shape = self.kdpoints.size / self.NDIM, NDIM
self.kdtree = KDTree(self.kdpoints, leafsize = self.kdpoints.shape[0]+1)
print "Preparing KDTree... Ready!"
Any suggestions please?
KDtree uses nested classes to define its node types (innernode, leafnode). Pickle only works on module-level class definitions, so a nested class trips it up:
import cPickle
class Foo(object):
class Bar(object):
pass
obj = Foo.Bar()
print obj.__class__
cPickle.dumps(obj)
<class '__main__.Bar'>
cPickle.PicklingError: Can't pickle <class '__main__.Bar'>: attribute lookup __main__.Bar failed
However, there is a (hacky) workaround by monkey-patching the class definitions into the scipy.spatial.kdtree
at module scope so the pickler can find them. If all of your code which reads and writes pickled KDtree objects installs these patches, this hack should work fine:
import cPickle
import numpy
from scipy.spatial import kdtree
# patch module-level attribute to enable pickle to work
kdtree.node = kdtree.KDTree.node
kdtree.leafnode = kdtree.KDTree.leafnode
kdtree.innernode = kdtree.KDTree.innernode
x, y = numpy.mgrid[0:5, 2:8]
t1 = kdtree.KDTree(zip(x.ravel(), y.ravel()))
r1 = t1.query([3.4, 4.1])
raw = cPickle.dumps(t1)
# read in the pickled tree
t2 = cPickle.loads(raw)
r2 = t2.query([3.4, 4.1])
print t1.tree.__class__
print repr(raw)[:70]
print t1.data[r1[1]], t2.data[r2[1]]
Output:
<class 'scipy.spatial.kdtree.innernode'>
"ccopy_reg\n_reconstructor\np1\n(cscipy.spatial.kdtree\nKDTree\np2\nc_
[3 4] [3 4]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With