I have tried multiple approaches to pickle a python function with dependencies, following many recommendations on StackOverflow, (such as dill, cloudpickle, etc.) but all seem to run into a fundamental issue that I cannot figure out.
I have a main module that tries to pickle a function from an imported module, sends it over ssh to be unpickled and executed at a remote machine.
So main has:
import dill (for example)
import modulea
serial=dill.dumps( modulea.func )
send (serial)
On the remote machine:
import dill
receive serial
funcremote = dill.loads( serial )
funcremote()
If the functions being pickled and sent are top level functions defined in main itself, everything works. When they are in an imported module, the loads function fails with messages of the type "module modulea not found".
It appears that the module name is pickled along with the function name. I do not see any way to "fix up" the pickle to remove the dependency, or alternately, to create a dummy module in the receiver to become the recipient of the unpickling.
Any pointers will be much appreciated.
--prasanna
Functions and classes can be serialized, and as we will see below, so can instances of classes. The Python pickle module stores the data in a binary form, so it isn't human-readable.
Python pickle module is used for serializing and de-serializing a Python object structure. Any object in Python can be pickled so that it can be saved on disk.
JSON is a lightweight format and is much faster than Pickling. There is always a security risk with Pickle. Unpickling data from unknown sources should be avoided as it may contain malicious or erroneous data. There are no loopholes in security using JSON, and it is free from security threats.
The pickle module is part of the Python standard library and implements methods to serialize (pickling) and deserialize (unpickling) Python objects. Afterward, to serialize a Python object such as a dictionary and store the byte stream as a file, we can use pickle's dump() method.
I'm the dill
author. I do this exact thing over ssh
, but with success. Currently, dill
and any of the other serializers pickle modules by reference… so to successfully pass a function defined in a file, you have to ensure that the relevant module is also installed on the other machine. I do not believe there is any object serializer that serializes modules directly (i.e. not by reference).
Having said that, dill
does have some options to serialize object dependencies. For example, for class instances, the default in dill
is to not serialize class instances by reference… so the class definition can also be serialized and send with the instance. In dill
, you can also (use a very new feature to) serialize file handles by serializing the file, instead of the doing so by reference. But again, if you have the case of a function defined in a module, you are out-of-luck, as modules are serialized by reference pretty darn universally.
You might be able to use dill
to do so, however, just not with pickling the object, but with extracting the source and sending the source code. In pathos.pp
and pyina
, dill
us used to extract the source and the dependencies of any object (including functions), and pass them to another computer/process/etc. However, since this is not an easy thing to do, dill
can also use the failover of trying to extract a relevant import and send that instead of the source code.
You can understand, hopefully, this is a messy messy thing to do (as noted in one of the dependencies of the function I am extracting below). However, what you are asking is successfully done in the pathos
package to pass code and dependencies to different machines across ssh-tunneled ports.
>>> import dill
>>>
>>> print dill.source.importable(dill.source.importable)
from dill.source import importable
>>> print dill.source.importable(dill.source.importable, source=True)
def _closuredsource(func, alias=''):
"""get source code for closured objects; return a dict of 'name'
and 'code blocks'"""
#FIXME: this entire function is a messy messy HACK
# - pollutes global namespace
# - fails if name of freevars are reused
# - can unnecessarily duplicate function code
from dill.detect import freevars
free_vars = freevars(func)
func_vars = {}
# split into 'funcs' and 'non-funcs'
for name,obj in list(free_vars.items()):
if not isfunction(obj):
# get source for 'non-funcs'
free_vars[name] = getsource(obj, force=True, alias=name)
continue
# get source for 'funcs'
#…snip… …snip… …snip… …snip… …snip…
# get source code of objects referred to by obj in global scope
from dill.detect import globalvars
obj = globalvars(obj) #XXX: don't worry about alias?
obj = list(getsource(_obj,name,force=True) for (name,_obj) in obj.items())
obj = '\n'.join(obj) if obj else ''
# combine all referred-to source (global then enclosing)
if not obj: return src
if not src: return obj
return obj + src
except:
if tried_import: raise
tried_source = True
source = not source
# should never get here
return
I imagine something could also be built around the dill.detect.parents
method, which provides a list of pointers to all parent object for any given object… and one could reconstruct all of any function's dependencies as objects… but this is not implemented.
BTW: to establish a ssh tunnel, just do this:
>>> t = pathos.Tunnel.Tunnel()
>>> t.connect('login.university.edu')
39322
>>> t
Tunnel('-q -N -L39322:login.university.edu:45075 login.university.edu')
Then you can work across the local port with ZMQ
, or ssh
, or whatever. If you want to do so with ssh
, pathos
also has that built in.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With