Why multiprocessing.Process behave differently on windows and linux for global object and function arguments

Tags:

multiprocessing

The following code has different output when running on windows and linux (both with python2.7)

'''import_mock.py'''
to_mock = None

Click to copy

'''test.py'''
import import_mock
from multiprocessing import Process

class A(object):
    def __init__(self):
        self.a = 1
        self.b = 2
        self.c = 3

    def __getstate__(self):
        print '__getstate__'
        return { 'a': self.a, 'b': self.b,
                 'c':0 }

def func():
    import_mock.to_mock = 1
    a = A()
    return a

def func1(a):
    print a.a, a.b, a.c
    print import_mock.to_mock


if __name__ == '__main__':
    a = func()
    p = Process(target=func1, args=(a,))
    p.start()
    p.join()

On windows, the output is:

Click to copy

__getstate__
1 2 0
None

Which is what I expected

On linux, it is:

Click to copy

1 2 3
1

Which not clone the global object and the passed args.

My question is why they behave differently? And how to make the linux code behave the same as windows one?

571

asked Jul 07 '16 01:07

Video Answer

2 Answers

On Linux (and other Unix-like OSs), Python's multiprocessing module using fork() to create new child processes that efficiently inherit a copy of the parent process's memory state. That means the interpreter doesn't need to pickle the objects that are being passed as the Process's args since the child process will already have them available in their normal form.

Windows doesn't have a fork() system call however, so the multiprocessing module needs to do a bit more work to make the child-spawning process work. The fork()-based implementation came first, and the non-forking Windows implementation came later.

It's worth noting that the Python developers had often felt it was a bit of a misfeature for the creation of child processes to differ so much based on the platform you're running Python on. So in Python 3.4, a new system was added to allow you to select the start method that you would prefer to use. The options are "fork", "forkserver" and "spawn". The "fork" method remains the default on Unix-like systems (where it was the only implementation in earlier versions of Python). The "spawn" method is the default (and only) option on Windows, but now can be used on Unix-like systems too. The "forkserver" method is sort of a hybrid between the two (and only available on some Unix-like systems). You can read more about the differences between the methods in the documentation.

105

answered Sep 18 '22 12:09

Blckknght

Adding to @Blckknght's answer: on Windows, each process imports the original module "from scratch", while on Unix-y systems only the main process runs the whole module, while all other processes see whatever exists at the time fork() is used to create the new processes (no, you're not calling fork() yourself - multiprocessing internals call it whenever it creates a new process).

In detail, for your import_mock:

On all platforms, the main process calls func(), which sets import_mock.to_mock to 1.
On Unix-y platforms, that's what all new processes see: the fork() occurs after that, so 1 is the state all new processes inherit.
On Windows, all new processes run the entire module "from scratch". So they each import their own, brand new version of import_mock. Only the main process calls func(), so only the main process sees to_mock change to 1. All other processes see the fresh None state.

That's all expected, and actually easy to understand the second time ;-)

What's going on with passing a is subtler, because it depends more on multiprocessing implementation details. The implementation could have chosen to pickle arguments on all platforms from the start, but it didn't, and now it's too late to change without breaking stuff on some platforms.

Because of copy-on-write fork() semantics, it wasn't necessary to pickle Process() arguments on Unix-y systems, and so the implementation never did. However, without fork() it is necessary to pickle them on Windows - and so the implementation does.

Before Python 3.4, which allows you to force "the Windows implementation" (spawn) on all platforms, there's no mechanical way to avoid possible cross-platform surprises.

But in practice, I've rarely been bothered by this. Knowing that, for example, multiprocessing can depend heavily on pickling, I stay completely clear of getting anywhere near playing tricks with pickles. The only reason you had "a problem" passing an A() instance is that you are playing pickle tricks (via overriding the default __getstate__()).

answered Sep 21 '22 12:09

Tim Peters

Related questions
                            
                                Python script to see if a web page exists without downloading the whole page?
                            
                                Why I can't use urlencode to encode json format data?
                            
                                How to use urllib2.urlopen to make POST request without data argument
                            
                                Delete Characters in Python Printed Line
                            
                                How do I remove the last n characters from a string?
                            
                                matplotlib - 3d surface from a rectangular array of heights
                            
                                How to create fake text file in Python
                            
                                Django how to check if the object has property in view
                            
                                How to convert object to json file for three.js model loader
                            
                                Cannot write XML file with default namespace [duplicate]
                            
                                Call python script from ruby
                            
                                Deploying Django project with Gunicorn and nginx
                            
                                Insert and update with core SQLAlchemy
                            
                                Python/matplotlib : getting rid of matplotlib.mpl warning
                            
                                How to exit a Kivy application using a button
                            
                                Issues iterating through JSON list in Python?
                            
                                Matplotlib.pyplot.hist() very slow
                            
                                Pyspark - Aggregation on multiple columns
                            
                                Geopandas PostGIS connection
                            
                                What is the correct ways to write Boto3 filters to use customise tag name?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Why multiprocessing.Process behave differently on windows and linux for global object and function arguments

Tags:

python

multiprocessing

Patrick

People also ask

Video Answer

2 Answers

Blckknght

Tim Peters

Recent Activity

Donate For Us