Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why multiprocessing.Process behave differently on windows and linux for global object and function arguments

The following code has different output when running on windows and linux (both with python2.7)

'''import_mock.py'''
to_mock = None
'''test.py'''
import import_mock
from multiprocessing import Process

class A(object):
    def __init__(self):
        self.a = 1
        self.b = 2
        self.c = 3

    def __getstate__(self):
        print '__getstate__'
        return { 'a': self.a, 'b': self.b,
                 'c':0 }

def func():
    import_mock.to_mock = 1
    a = A()
    return a

def func1(a):
    print a.a, a.b, a.c
    print import_mock.to_mock


if __name__ == '__main__':
    a = func()
    p = Process(target=func1, args=(a,))
    p.start()
    p.join()

On windows, the output is:

__getstate__
1 2 0
None

Which is what I expected

On linux, it is:

1 2 3
1

Which not clone the global object and the passed args.

My question is why they behave differently? And how to make the linux code behave the same as windows one?

like image 571
Patrick Avatar asked Jul 07 '16 01:07

Patrick


People also ask

Does multiprocessing work on Windows?

The multiprocessing package offers both local and remote concurrency, effectively side-stepping the Global Interpreter Lock by using subprocesses instead of threads. Due to this, the multiprocessing module allows the programmer to fully leverage multiple processors on a given machine. It runs on both Unix and Windows.

How does Linux multiprocessing work?

Linux is a multiprocessing operating system, its objective is to have a process running on each CPU in the system at all times, to maximize CPU utilization. If there are more processes than CPUs (and there usually are), the rest of the processes must wait before a CPU becomes free until they can be run.

Which is the method used to change the default way to create child processes in multiprocessing?

Python provides the ability to create and manage new processes via the multiprocessing. Process class. In multiprocessing programming, we may need to change the technique used to start child processes. This is called the start method.

What is the difference between pool and process in multiprocessing?

Pool supports multiple tasks, whereas the multiprocessing. Process class supports a single task. The Pool is designed to submit and execute multiple tasks. For example, the map(), imap(), and starmap() functions are explicitly designed to perform multiple function calls in parallel.


Video Answer


2 Answers

On Linux (and other Unix-like OSs), Python's multiprocessing module using fork() to create new child processes that efficiently inherit a copy of the parent process's memory state. That means the interpreter doesn't need to pickle the objects that are being passed as the Process's args since the child process will already have them available in their normal form.

Windows doesn't have a fork() system call however, so the multiprocessing module needs to do a bit more work to make the child-spawning process work. The fork()-based implementation came first, and the non-forking Windows implementation came later.

It's worth noting that the Python developers had often felt it was a bit of a misfeature for the creation of child processes to differ so much based on the platform you're running Python on. So in Python 3.4, a new system was added to allow you to select the start method that you would prefer to use. The options are "fork", "forkserver" and "spawn". The "fork" method remains the default on Unix-like systems (where it was the only implementation in earlier versions of Python). The "spawn" method is the default (and only) option on Windows, but now can be used on Unix-like systems too. The "forkserver" method is sort of a hybrid between the two (and only available on some Unix-like systems). You can read more about the differences between the methods in the documentation.

like image 105
Blckknght Avatar answered Sep 18 '22 12:09

Blckknght


Adding to @Blckknght's answer: on Windows, each process imports the original module "from scratch", while on Unix-y systems only the main process runs the whole module, while all other processes see whatever exists at the time fork() is used to create the new processes (no, you're not calling fork() yourself - multiprocessing internals call it whenever it creates a new process).

In detail, for your import_mock:

  • On all platforms, the main process calls func(), which sets import_mock.to_mock to 1.

  • On Unix-y platforms, that's what all new processes see: the fork() occurs after that, so 1 is the state all new processes inherit.

  • On Windows, all new processes run the entire module "from scratch". So they each import their own, brand new version of import_mock. Only the main process calls func(), so only the main process sees to_mock change to 1. All other processes see the fresh None state.

That's all expected, and actually easy to understand the second time ;-)

What's going on with passing a is subtler, because it depends more on multiprocessing implementation details. The implementation could have chosen to pickle arguments on all platforms from the start, but it didn't, and now it's too late to change without breaking stuff on some platforms.

Because of copy-on-write fork() semantics, it wasn't necessary to pickle Process() arguments on Unix-y systems, and so the implementation never did. However, without fork() it is necessary to pickle them on Windows - and so the implementation does.

Before Python 3.4, which allows you to force "the Windows implementation" (spawn) on all platforms, there's no mechanical way to avoid possible cross-platform surprises.

But in practice, I've rarely been bothered by this. Knowing that, for example, multiprocessing can depend heavily on pickling, I stay completely clear of getting anywhere near playing tricks with pickles. The only reason you had "a problem" passing an A() instance is that you are playing pickle tricks (via overriding the default __getstate__()).

like image 37
Tim Peters Avatar answered Sep 21 '22 12:09

Tim Peters