Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is there a way to pickle a scipy.interpolate.Rbf() object?

I'm creating a radial basis function interpolation model for a rather large dataset. The main call `scipy.interpolate.Rbf(,) takes about one minute and 14 GB of RAM. Since not every machine this is supposed to run on is capable of doing this, and since the program will run on the same dataset very often, I'd like to pickle the results to a file. This is a simplified example:

import scipy.interpolate as inter
import numpy as np
import cPickle

x = np.array([[1,2,3],[3,4,5],[7,8,9],[1,5,9]])
y = np.array([1,2,3,4])

rbfi = inter.Rbf(x[:,0], x[:,1], x[:,2], y)

RBFfile = open('picklefile','wb')
RBFpickler = cPickle.Pickler(RBFfile,protocol=2)
RBFpickler.dump(rbfi)
RBFfile.close()

The RBFpickler.dump() call results in a can't pickle <type 'instancemethod'> error. As I understand, that means there's a method somewhere in there (well, rbfi() is callable), and that can't be pickled for some reason I do not quite understand.

Does anyone know a way of either pickling this in some other way or saving the results of the inter.Rbf() call in some other way?

There are some arrays of shape (nd,n) and (n,n) in there (rbfi.A, rbfi.xi, rbfi.di...), which I assume store all the interesting information. I guess I could pickle just those arrays, but then I'm not sure how I could put the object together again...

Edit: Additional constraint: I'm not allowed to install additional libraries on the system. The only way I can include them is if they are pure Python and I can just include them with the script without having to compile anything.

like image 682
Zak Avatar asked Nov 01 '22 22:11

Zak


1 Answers

I'd use dill to serialize the results… or if you want to have a cached function you could use klepto to cache the function call so you'd minimize reevaluation of the function.

Python 2.7.6 (default, Nov 12 2013, 13:26:39) 
[GCC 4.2.1 Compatible Apple Clang 4.1 ((tags/Apple/clang-421.11.66))] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import scipy.interpolate as inter
>>> import numpy as np
>>> import dill
>>> import klepto
>>> 
>>> x = np.array([[1,2,3],[3,4,5],[7,8,9],[1,5,9]])
>>> y = np.array([1,2,3,4])
>>> 
>>> # build an on-disk archive for numpy arrays,
>>> # with a dictionary-style interface  
>>> p = klepto.archives.dir_archive(serialized=True, fast=True)
>>> # add a caching algorithm, so when threshold is hit,
>>> # memory is dumped to disk
>>> c = klepto.safe.lru_cache(cache=p)
>>> # decorate the target function with the cache
>>> c(inter.Rbf)
<function Rbf at 0x104248668>
>>> rbf = _
>>> 
>>> # 'rbf' is now cached, so all repeat calls are looked up
>>> # from disk or memory
>>> d = rbf(x[:,0], x[:,1], x[:,2], y)
>>> d
<scipy.interpolate.rbf.Rbf object at 0x1042454d0>
>>> d.A
array([[ 1.        ,  1.22905719,  2.36542472,  1.70724365],
       [ 1.22905719,  1.        ,  1.74422655,  1.37605151],
       [ 2.36542472,  1.74422655,  1.        ,  1.70724365],
       [ 1.70724365,  1.37605151,  1.70724365,  1.        ]])
>>> 

continuing…

>>> # the cache is serializing the result object behind the scenes
>>> # it also works if we directly pickle and unpickle it
>>> _d = dill.loads(dill.dumps(d))
>>> _d
<scipy.interpolate.rbf.Rbf object at 0x104245510>
>>> _d.A
array([[ 1.        ,  1.22905719,  2.36542472,  1.70724365],
       [ 1.22905719,  1.        ,  1.74422655,  1.37605151],
       [ 2.36542472,  1.74422655,  1.        ,  1.70724365],
       [ 1.70724365,  1.37605151,  1.70724365,  1.        ]])
>>>

Get klepto and dill here: https://github.com/uqfoundation

like image 74
Mike McKerns Avatar answered Nov 09 '22 13:11

Mike McKerns