If I pickle a function with dill that contains a global, somehow that global state isn't respected when the function is loaded again. I don't understand enough about dill to be anymore specific, but take this working code for example:
import multiprocessing
import dill
def initializer():
global foo
foo = 1
def worker(arg):
return foo
with multiprocessing.Pool(2, initializer) as pool:
res = pool.map(worker, range(10))
print(res)
This works fine, and prints [1, 1] as expected. However, if I instead pickle the initializer and worker functions using dill's recurse=True, and then restore them, it fails:
import multiprocessing
import dill
def initializer():
global foo
foo = 1
def worker(arg):
return foo
with open('funcs.pkl', 'wb') as f:
dill.dump((initializer, worker), f, recurse=True)
with open('funcs.pkl', 'rb') as f:
initializer, worker = dill.load(f)
with multiprocessing.Pool(2, initializer) as pool:
res = pool.map(worker, range(2))
This code fails with the following error:
File "/tmp/ipykernel_158597/1183951641.py", line 9, in worker
return foo
^^^
NameError: name 'foo' is not defined
If I use recurse=False it works fine, but somehow pickling them in this way causes the code to break. Why?
With the recurse=True option, dill.dump builds a new globals dict for the function being serialized with objects that the function refers to also recursively serialized. The side effect is that when deserialized with dill.load, these objects are reconstructed as new objects, including the globals dict for the function.
This is why, after deserialization, the globals dicts of the functions become different objects from each other, so that changes made to the globals dict of the initializer function have no effect on the globals dict of the worker function.
You can verify this behavior by checking the identity of the global namespace in which a function object is defined and runs under, availble as the __globals__ attribute of the function object:
import dill
def initializer():
global foo
foo = 1
def worker(arg):
return foo
print(id(globals()))
print(id(initializer.__globals__))
print(id(worker.__globals__))
with open('funcs.pkl', 'wb') as f:
dill.dump((initializer, worker), f, recurse=True)
with open('funcs.pkl', 'rb') as f:
initializer, worker = dill.load(f)
print('-- dilled --')
print(id(globals()))
print(id(initializer.__globals__))
print(id(worker.__globals__))
This outputs something like:
124817730351552
124817730351552
124817730351552
-- dilled --
124817730351552
124817727897280
124817728060352
Demo: https://replit.com/@blhsing1/RelievedPrimeLaws
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With