I'm creating an object of a class(with multiprocessing
) and adding it to a Manager.dict()
so that I can delete the item from the dictionary inside the object (the item points to) when its work completes..
I tried the following code:
from multiprocessing import Manager, Process
class My_class(Process):
def __init__(self):
super(My_class, self).__init__()
print "Object", self, "created."
def run(self):
print "Object", self, "process started."
manager=Manager()
object_dict=manager.dict()
for x in range(2):
object_dict[x]=My_class()
object_dict[x].start()
But I got an error:
TypeError: Pickling an AuthenticationString object is disallowed
for security reasons
For curiosity, I removed the multiprocessing part, and tried like:
from multiprocessing import Manager
class My_class():
def __init__(self):
print "Object", self, "created."
manager=Manager()
object_dict=manager.dict()
for x in range(2):
object_dict[x]=My_class()
and it's giving me no errors and displaying the addresses of two objects.
What's that error and how to make it go away?
Here is a shorter way to replicate the effect you are seeing:
from multiprocessing import Process
import pickle
p = Process()
pickle.dumps(p._config['authkey'])
TypeError: Pickling an AuthenticationString object is disallowed for security reasons
What is actually happening here is the following: the process._config['authkey']
is the secret key that the Process
object gets assigned on creation. Although this key is nothing more but a sequence of bytes, Python uses a special subclass of bytes
to represent it: AuthenticationString
. This subclass differs from the usual bytes
in only one aspect - it refuses to be pickled.
The rationale behind this choice is the following: the authkey is used for authenticating inter-process communication messages between parent and child processes (e.g. between the workers and the main process) and exposing it anywhere outside the initial process family could pose a security risk (because you could, in principle, impersonate a "parent process" for the worker and force it into executing arbitrary code). As pickling is the most common form of data transfer in Python, prohibiting it is a simple way of an unintended exposure of the authkey.
As you cannot pickle an AuthenticationString
, you also cannot pickle instances of Process
class or any of its subclasses (because all of them contain an authentication key in a field).
Now let us take a look at how it all relates to your code. You create a Manager
object and attempt to set the values of its dict
. The Manager
actually runs in a separate process and whenever you assign any data to manager.dict()
, Python needs to transfer this data to the Manager's
own process. For that transfer to happen, the data is being pickled. But, as we know from the previous paragraphs, you cannot pickle Process
objects and hence cannot keep them in a shared dict
at all.
In short, you are free to use manager.dict()
to share any objects, except those which cannot be pickled, such as the Process
objects.
Note: the solution below is in Python3 aka print(). The same issue exists in Python3 also.
Well, in your specific example, we can work around the problem by pickling the AuthenticationString
inside the _config
dict that's part of the Process
object as a bytes buffer and then gracefully restoring it when unpickling as if nothing happened. Define the get and set state methods that are called for pickling ops as follows inside My_class
:
from multiprocessing import Manager, Process
from multiprocessing.process import AuthenticationString
class My_class(Process):
def __init__(self):
super(My_class, self).__init__()
print("Object", self, "created.")
def run(self):
print("Object", self, "process started.")
def __getstate__(self):
"""called when pickling - this hack allows subprocesses to
be spawned without the AuthenticationString raising an error"""
state = self.__dict__.copy()
conf = state['_config']
if 'authkey' in conf:
#del conf['authkey']
conf['authkey'] = bytes(conf['authkey'])
return state
def __setstate__(self, state):
"""for unpickling"""
state['_config']['authkey'] = AuthenticationString(state['_config']['authkey'])
self.__dict__.update(state)
if __name__ == '__main__': # had to add this
manager=Manager()
object_dict=manager.dict()
for x in range(2):
object_dict[x]=My_class()
object_dict[x].start()
I get the following output from running the code:
Object <My_class(My_class-2, initial)> created.
Object <My_class(My_class-3, initial)> created.
Object <My_class(My_class-2, started)> process started.
Object <My_class(My_class-3, started)> process started.
Which appears to be the intended outcome, and if you put a time.sleep()
call in to keep them alive a bit longer, you can see the two sub-processess running.
Alternatively, it doesn't seem to upset anything if you simply delete that _config
authkey
and then you don't even need to define a custom __setstate__
method.
Also, note that I had to add in __main__
- without it python complained about not having finished its bootstrapping before launching sub processes.
Finally, I just have to shrug my shoulders at this whole "security" thing. It pops up in all sorts of places (with the same type of work-around required) and doesn't provide any real security.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With