Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Saving KDTree object in Python?

I am using Scipy's KDTree implementation to read a large file of 300 MB. Now, is there a way I can just save the datastructure to disk and load it again or am I stuck with reading raw points from file and constructing the data structure each time I start my program? I am constructing the KDTree as follows:

def buildKDTree(self):
        self.kdpoints = numpy.fromfile("All", sep=' ')
        self.kdpoints.shape = self.kdpoints.size / self.NDIM, NDIM
        self.kdtree = KDTree(self.kdpoints, leafsize = self.kdpoints.shape[0]+1)
        print "Preparing KDTree... Ready!"

Any suggestions please?

like image 419
Legend Avatar asked Apr 24 '11 20:04

Legend


1 Answers

KDtree uses nested classes to define its node types (innernode, leafnode). Pickle only works on module-level class definitions, so a nested class trips it up:

import cPickle

class Foo(object):
    class Bar(object):
        pass

obj = Foo.Bar()
print obj.__class__
cPickle.dumps(obj)

<class '__main__.Bar'>
cPickle.PicklingError: Can't pickle <class '__main__.Bar'>: attribute lookup __main__.Bar failed

However, there is a (hacky) workaround by monkey-patching the class definitions into the scipy.spatial.kdtree at module scope so the pickler can find them. If all of your code which reads and writes pickled KDtree objects installs these patches, this hack should work fine:

import cPickle
import numpy
from scipy.spatial import kdtree

# patch module-level attribute to enable pickle to work
kdtree.node = kdtree.KDTree.node
kdtree.leafnode = kdtree.KDTree.leafnode
kdtree.innernode = kdtree.KDTree.innernode

x, y = numpy.mgrid[0:5, 2:8]
t1 = kdtree.KDTree(zip(x.ravel(), y.ravel()))
r1 = t1.query([3.4, 4.1])
raw = cPickle.dumps(t1)

# read in the pickled tree
t2 = cPickle.loads(raw)
r2 = t2.query([3.4, 4.1])
print t1.tree.__class__
print repr(raw)[:70]
print t1.data[r1[1]], t2.data[r2[1]]

Output:

<class 'scipy.spatial.kdtree.innernode'>
"ccopy_reg\n_reconstructor\np1\n(cscipy.spatial.kdtree\nKDTree\np2\nc_
[3 4] [3 4]
like image 170
samplebias Avatar answered Sep 21 '22 04:09

samplebias