Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is being pickled when I call multiprocessing.Process?

I know that multiprocessing uses pickling in order to have the processes run on different CPUs, but I think I am a little confused as to what is being pickled. Lets look at this code.

from multiprocessing import Process

def f(I):
    print('hello world!',I)

if __name__ == '__main__':
    for I in (range1, 3):
        Process(target=f,args=(I,)).start()

I assume what is being pickled is the def f(I) and the argument going in. First, is this assumption correct?

Second, lets say f(I) has a function call within in it like:

def f(I):
    print('hello world!',I)
    randomfunction()

Does the randomfunction's definition get pickled as well, or is it only the function call?

Further more, if that function call was located in another file, would the process be able to call it?

like image 409
Eric Thomas Avatar asked Sep 24 '14 20:09

Eric Thomas


2 Answers

In this particular example, what gets pickled is platform dependent. On systems that support os.fork, like Linux, nothing is pickled here. Both the target function and the args you're passing get inherited by the child process via fork.

On platforms that don't support fork, like Windows, the f function and args tuple will both be pickled and sent to the child process. The child process will re-import your __main__ module, and then unpickle the function and its arguments.

In either case, randomfunction is not actually pickled. When you pickle f, all you're really pickling is a pointer for the child function to re-build the f function object. This is usually little more than a string that tells the child how to re-import f:

>>> def f(I):
...     print('hello world!',I)
...     randomfunction()
... 
>>> pickle.dumps(f)
'c__main__\nf\np0\n.'

The child process will just re-import f, and then call it. randomfunction will be accessible as long as it was properly imported into the original script to begin with.

Note that in Python 3.4+, you can get the Windows-style behavior on Linux by using contexts:

ctx = multiprocessing.get_context('spawn')
ctx.Process(target=f,args=(I,)).start()  # even on Linux, this will use pickle

The descriptions of the contexts are also probably relevant here, since they apply to Python 2.x as well:

spawn

The parent process starts a fresh python interpreter process. The child process will only inherit those resources necessary to run the process objects run() method. In particular, unnecessary file descriptors and handles from the parent process will not be inherited. Starting a process using this method is rather slow compared to using fork or forkserver.

Available on Unix and Windows. The default on Windows.

fork

The parent process uses os.fork() to fork the Python interpreter. The child process, when it begins, is effectively identical to the parent process. All resources of the parent are inherited by the child process. Note that safely forking a multithreaded process is problematic.

Available on Unix only. The default on Unix.

forkserver

When the program starts and selects the forkserver start method, a server process is started. From then on, whenever a new process is needed, the parent process connects to the server and requests that it fork a new process. The fork server process is single threaded so it is safe for it to use os.fork(). No unnecessary resources are inherited.

Available on Unix platforms which support passing file descriptors over Unix pipes.

Note that forkserver is only available in Python 3.4, there's no way to get that behavior on 2.x, regardless of the platform you're on.

like image 125
dano Avatar answered Sep 28 '22 01:09

dano


The function is pickled, but possibly not in the way you think of it:

You can look at what's actually in a pickle like this:

pickletools.dis(pickle.dumps(f))

I get:

 0: c    GLOBAL     '__main__ f'
12: p    PUT        0
15: .    STOP

You'll note that there is nothing in there correspond to the code of the function. Instead, it has references to __main__ f which is the module and name of the function. So when this is unpickled, it will always attempt to lookup the f function in the __main__ module and use that. When you use the multiprocessing module, that ends up being a copy of the same function as it was in your original program.

This does mean that if you somehow modify which function is located at __main__.f you'll end up unpickling a different function then you pickled in.

Multiprocessing brings up a complete copy of your program complete with all the functions you defined it. So you can just call functions. The entire function isn't copied over, just the name of the function. The pickle module's assumption is that function will be same in both copies of your program, so it can just lookup the function by name.

like image 29
Winston Ewert Avatar answered Sep 28 '22 01:09

Winston Ewert