Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

multiprocessing.Process (with spawn method): which objects are inherited?

The docs (python 3.4) explain that with spawn, "the child process will only inherit those resources necessary to run the process object's run() method".

But which objects are "necessary"? The way I read it suggested to me that all the objects that can be reached from inside run() are "necessary", including arguments passed as args to Process.__init__, plus whatever is stored in global variables, as well as classes, functions defined in global scope and their attributes. However, this is incorrect; the following code confirms that the objects stored in global variables aren't inherited:

# running under python 3.4 / Windows
# but behaves the same under Unix
import multiprocessing as mp

x = 0
class A:
    y = 0

def f():
    print(x) # 0
    print(A.y) # 0

def g(x, A):
    print(x) # 1
    print(A.y) # 0; really, not even args are inherited?

def main():
    global x
    x = 1
    A.y = 1
    p = mp.Process(target = f)
    p.start()
    q = mp.Process(target = g, args = (x, A))
    q.start()


if __name__=="__main__":
    mp.set_start_method('spawn')
    main()

Is there a clear rule that states which objects are inherited?

EDIT:

To confirm: running this on Ubuntu produces the same output. (Thanks to @mata for clarifying that I forgot add global x to main(). This omission made my example confusing; it would also affect the result if I were to switch 'spawn' to 'fork' under Ubuntu. I now added global x to the code above.)

like image 457
max Avatar asked Mar 22 '15 19:03

max


People also ask

What is spawn in multiprocessing?

In multiprocessing , processes are spawned by creating a Process object and then calling its start() method.

Is memory shared in multiprocessing?

shared_memory — Shared memory for direct access across processes. New in version 3.8. This module provides a class, SharedMemory , for the allocation and management of shared memory to be accessed by one or more processes on a multicore or symmetric multiprocessor (SMP) machine.

Can threads spawn multiple processes?

A thread is a sequence of instructions that are being executed within the context of a process. One process can spawn multiple threads but all of them will be sharing the same memory.

What is spawn in Python?

The term "spawn" means the creation of a process by a parent process. The parent process can of course continue its execution asynchronously or wait until the child process ends its execution. The multiprocessing library of Python allows the spawning of a process through the following steps: Build the object process.


1 Answers

This has to do with the way classes are pickled when being sent to the spawned Process. The pickled version of a class doesn't really contain its internal state, but only the module and the name of the class:

class A:
   y = 0

pickle.dumps(A)
# b'\x80\x03c__main__\nA\nq\x00.'

There is no information about y here, it's comparable to a reference to the class.

The class will be unpickled in the spawned process when passed as argumeht to g, which will import its module (here __main__) if neccessary and return a reference to the class, therefore changes made to it in your main function won't affect it as the if __name__ == "__main__" block won't be executed in the subprocess. f directly uses the class in its module, so the effect is basically the same.

The reason why x shows different values is a little different. Your f function will print the global variable x from the module. In your main() function you have another local variable x, so setting x = 1 here won't affect the module level x in neither processes. It's passed to g as argument, so in this case it will alays have the local value of 1.

like image 127
mata Avatar answered Sep 21 '22 14:09

mata