Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why does recurse=True cause dill not to respect globals in functions?

Tags:

python

dill

If I pickle a function with dill that contains a global, somehow that global state isn't respected when the function is loaded again. I don't understand enough about dill to be anymore specific, but take this working code for example:

import multiprocessing
import dill

def initializer():
    global foo
    foo = 1

def worker(arg):
    return foo
   
with multiprocessing.Pool(2, initializer) as pool:
    res = pool.map(worker, range(10))

print(res)

This works fine, and prints [1, 1] as expected. However, if I instead pickle the initializer and worker functions using dill's recurse=True, and then restore them, it fails:

import multiprocessing
import dill

def initializer():
    global foo
    foo = 1

def worker(arg):
    return foo

with open('funcs.pkl', 'wb') as f:
    dill.dump((initializer, worker), f, recurse=True)

with open('funcs.pkl', 'rb') as f:
    initializer, worker = dill.load(f)

with multiprocessing.Pool(2, initializer) as pool:
    res = pool.map(worker, range(2))

This code fails with the following error:

  File "/tmp/ipykernel_158597/1183951641.py", line 9, in worker
    return foo
           ^^^
NameError: name 'foo' is not defined

If I use recurse=False it works fine, but somehow pickling them in this way causes the code to break. Why?

like image 916
quant Avatar asked Nov 18 '25 17:11

quant


1 Answers

With the recurse=True option, dill.dump builds a new globals dict for the function being serialized with objects that the function refers to also recursively serialized. The side effect is that when deserialized with dill.load, these objects are reconstructed as new objects, including the globals dict for the function.

This is why, after deserialization, the globals dicts of the functions become different objects from each other, so that changes made to the globals dict of the initializer function have no effect on the globals dict of the worker function.

You can verify this behavior by checking the identity of the global namespace in which a function object is defined and runs under, availble as the __globals__ attribute of the function object:

import dill

def initializer():
    global foo
    foo = 1

def worker(arg):
    return foo

print(id(globals()))
print(id(initializer.__globals__))
print(id(worker.__globals__))

with open('funcs.pkl', 'wb') as f:
    dill.dump((initializer, worker), f, recurse=True)

with open('funcs.pkl', 'rb') as f:
    initializer, worker = dill.load(f)

print('-- dilled --')
print(id(globals()))
print(id(initializer.__globals__))
print(id(worker.__globals__))

This outputs something like:

124817730351552
124817730351552
124817730351552
-- dilled --
124817730351552
124817727897280
124817728060352

Demo: https://replit.com/@blhsing1/RelievedPrimeLaws

like image 195
blhsing Avatar answered Nov 20 '25 07:11

blhsing



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!