I know that multiprocessing
uses pickling in order to have the processes run on different CPUs, but I think I am a little confused as to what is being pickled. Lets look at this code.
from multiprocessing import Process
def f(I):
print('hello world!',I)
if __name__ == '__main__':
for I in (range1, 3):
Process(target=f,args=(I,)).start()
I assume what is being pickled is the def f(I)
and the argument going in. First, is this assumption correct?
Second, lets say f(I)
has a function call within in it like:
def f(I):
print('hello world!',I)
randomfunction()
Does the randomfunction
's definition get pickled as well, or is it only the function call?
Further more, if that function call was located in another file, would the process be able to call it?
In this particular example, what gets pickled is platform dependent. On systems that support os.fork
, like Linux, nothing is pickled here. Both the target function and the args you're passing get inherited by the child process via fork
.
On platforms that don't support fork
, like Windows, the f
function and args
tuple will both be pickled and sent to the child process. The child process will re-import your __main__
module, and then unpickle the function and its arguments.
In either case, randomfunction
is not actually pickled. When you pickle f
, all you're really pickling is a pointer for the child function to re-build the f
function object. This is usually little more than a string that tells the child how to re-import f
:
>>> def f(I):
... print('hello world!',I)
... randomfunction()
...
>>> pickle.dumps(f)
'c__main__\nf\np0\n.'
The child process will just re-import f
, and then call it. randomfunction
will be accessible as long as it was properly imported into the original script to begin with.
Note that in Python 3.4+, you can get the Windows-style behavior on Linux by using contexts:
ctx = multiprocessing.get_context('spawn')
ctx.Process(target=f,args=(I,)).start() # even on Linux, this will use pickle
The descriptions of the contexts are also probably relevant here, since they apply to Python 2.x as well:
spawn
The parent process starts a fresh python interpreter process. The child process will only inherit those resources necessary to run the process objects run() method. In particular, unnecessary file descriptors and handles from the parent process will not be inherited. Starting a process using this method is rather slow compared to using fork or forkserver.
Available on Unix and Windows. The default on Windows.
fork
The parent process uses os.fork() to fork the Python interpreter. The child process, when it begins, is effectively identical to the parent process. All resources of the parent are inherited by the child process. Note that safely forking a multithreaded process is problematic.
Available on Unix only. The default on Unix.
forkserver
When the program starts and selects the forkserver start method, a server process is started. From then on, whenever a new process is needed, the parent process connects to the server and requests that it fork a new process. The fork server process is single threaded so it is safe for it to use os.fork(). No unnecessary resources are inherited.
Available on Unix platforms which support passing file descriptors over Unix pipes.
Note that forkserver
is only available in Python 3.4, there's no way to get that behavior on 2.x, regardless of the platform you're on.
The function is pickled, but possibly not in the way you think of it:
You can look at what's actually in a pickle like this:
pickletools.dis(pickle.dumps(f))
I get:
0: c GLOBAL '__main__ f'
12: p PUT 0
15: . STOP
You'll note that there is nothing in there correspond to the code of the function. Instead, it has references to __main__ f
which is the module and name of the function. So when this is unpickled, it will always attempt to lookup the f
function in the __main__
module and use that. When you use the multiprocessing module, that ends up being a copy of the same function as it was in your original program.
This does mean that if you somehow modify which function is located at __main__.f
you'll end up unpickling a different function then you pickled in.
Multiprocessing brings up a complete copy of your program complete with all the functions you defined it. So you can just call functions. The entire function isn't copied over, just the name of the function. The pickle module's assumption is that function will be same in both copies of your program, so it can just lookup the function by name.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With