I've been reading and re-reading the IPython documentation/tutorial, and I can't figure out the issue with this particular piece of code. It seems to be that the function dimensionless_run
is not visible to the namespace delivered to each of the engines, but I'm confused because the function is defined in __main__
, and clearly visible as part of the global namespace.
wrapper.py:
import math, os
def dimensionless_run(inputs):
output_file = open(inputs['fn'],'w')
...
return output_stats
def parallel_run(inputs):
import math, os ## Removing this line causes a NameError: global name 'math'
## is not defined.
folder = inputs['folder']
zfill_amt = int(math.floor(math.log10(inputs['num_iters'])))
for i in range(inputs['num_iters']):
run_num_str = str(i).zfill(zfill_amt)
if not os.path.exists(folder + '/'):
os.mkdir(folder)
dimensionless_run(inputs)
return
if __name__ == "__main__":
inputs = [input1,input2,...]
client = Client()
lbview = client.load_balanced_view()
lbview.block = True
for x in sorted(globals().items()):
print x
lbview.map(parallel_run,inputs)
Executing this code after ipcluster start --n=6
yields the sorted global dictionary, including the math
and os
modules, and the parallel_run
and dimensionless_run
functions. This is followed by an IPython.parallel.error.CompositeError: one or more exceptions from call to method: parallel_run, which is composed of a large number of [n:apply]: NameError: global name 'dimensionless_run' is not defined
, where n runs from 0-5.
There are two things I don't understand, and they're clearly linked.
dimensionless_run
in the global namespace?import math, os
necessary inside the definition of parallel_run?Edited: This turned out not be much of a namespace error at all--I was executing ipcluster start --n=6
in a directory that didn't contain the code. To fix it, all I needed to do was execute the start command in my code's directory. I also fixed it by adding the lines:
inputs = input_pairs
os.system("ipcluster start -n 6") #NEW
client = Client()
...
lbview.map(parallel_run,inputs)
os.system("ipcluster stop") #NEW
which start the required cluster in the right place.
This is mostly a duplicate of Python name space issues with IPython.parallel, which has a more detailed answer, but the gist:
When the Client sends parallel_run
to the engine, it just sends that function, not the entire namespace in which the function is defined (the __main__
module). So when running the remote parallel_run
, lookups to math
or os
or dimensionless_run
will look first in locals()
(what has been defined already in the function, i.e. your in-function imports), then in the globals()
, which is the __main__
module on the engine.
There are various approaches to making sure names available on the engines, but perhaps the simplest is to explicitly define/send them to the engines (the interactive namespace is __main__
on the engines, just like it is locally in IPython):
client[:].execute("import os, math")
client[:]['dimensionless_run'] = dimensionless_run
prior to making your run, in which case everything should work as you expect.
This is an issue unique to modules defined interactively / in a script - It does not come up if this file is a module instead of a script, e.g.
from mymod import parallel_run
lbview.map(parallel_run, inputs)
In which case the globals()
is the module globals, which are generally the same everywhere.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With