The following code has different output when running on windows and linux (both with python2.7)
'''import_mock.py'''
to_mock = None
'''test.py'''
import import_mock
from multiprocessing import Process
class A(object):
def __init__(self):
self.a = 1
self.b = 2
self.c = 3
def __getstate__(self):
print '__getstate__'
return { 'a': self.a, 'b': self.b,
'c':0 }
def func():
import_mock.to_mock = 1
a = A()
return a
def func1(a):
print a.a, a.b, a.c
print import_mock.to_mock
if __name__ == '__main__':
a = func()
p = Process(target=func1, args=(a,))
p.start()
p.join()
On windows, the output is:
__getstate__
1 2 0
None
Which is what I expected
On linux, it is:
1 2 3
1
Which not clone the global object and the passed args.
My question is why they behave differently? And how to make the linux code behave the same as windows one?
The multiprocessing package offers both local and remote concurrency, effectively side-stepping the Global Interpreter Lock by using subprocesses instead of threads. Due to this, the multiprocessing module allows the programmer to fully leverage multiple processors on a given machine. It runs on both Unix and Windows.
Linux is a multiprocessing operating system, its objective is to have a process running on each CPU in the system at all times, to maximize CPU utilization. If there are more processes than CPUs (and there usually are), the rest of the processes must wait before a CPU becomes free until they can be run.
Python provides the ability to create and manage new processes via the multiprocessing. Process class. In multiprocessing programming, we may need to change the technique used to start child processes. This is called the start method.
Pool supports multiple tasks, whereas the multiprocessing. Process class supports a single task. The Pool is designed to submit and execute multiple tasks. For example, the map(), imap(), and starmap() functions are explicitly designed to perform multiple function calls in parallel.
On Linux (and other Unix-like OSs), Python's multiprocessing
module using fork()
to create new child processes that efficiently inherit a copy of the parent process's memory state. That means the interpreter doesn't need to pickle the objects that are being passed as the Process
's args
since the child process will already have them available in their normal form.
Windows doesn't have a fork()
system call however, so the multiprocessing
module needs to do a bit more work to make the child-spawning process work. The fork()
-based implementation came first, and the non-forking Windows implementation came later.
It's worth noting that the Python developers had often felt it was a bit of a misfeature for the creation of child processes to differ so much based on the platform you're running Python on. So in Python 3.4, a new system was added to allow you to select the start method that you would prefer to use. The options are "fork"
, "forkserver"
and "spawn"
. The "fork"
method remains the default on Unix-like systems (where it was the only implementation in earlier versions of Python). The "spawn"
method is the default (and only) option on Windows, but now can be used on Unix-like systems too. The "forkserver"
method is sort of a hybrid between the two (and only available on some Unix-like systems). You can read more about the differences between the methods in the documentation.
Adding to @Blckknght's answer: on Windows, each process imports the original module "from scratch", while on Unix-y systems only the main process runs the whole module, while all other processes see whatever exists at the time fork()
is used to create the new processes (no, you're not calling fork()
yourself - multiprocessing
internals call it whenever it creates a new process).
In detail, for your import_mock
:
On all platforms, the main process calls func()
, which sets import_mock.to_mock
to 1.
On Unix-y platforms, that's what all new processes see: the fork()
occurs after that, so 1 is the state all new processes inherit.
On Windows, all new processes run the entire module "from scratch". So they each import their own, brand new version of import_mock
. Only the main process calls func()
, so only the main process sees to_mock
change to 1. All other processes see the fresh None
state.
That's all expected, and actually easy to understand the second time ;-)
What's going on with passing a
is subtler, because it depends more on multiprocessing
implementation details. The implementation could have chosen to pickle arguments on all platforms from the start, but it didn't, and now it's too late to change without breaking stuff on some platforms.
Because of copy-on-write fork()
semantics, it wasn't necessary to pickle Process()
arguments on Unix-y systems, and so the implementation never did. However, without fork()
it is necessary to pickle them on Windows - and so the implementation does.
Before Python 3.4, which allows you to force "the Windows implementation" (spawn
) on all platforms, there's no mechanical way to avoid possible cross-platform surprises.
But in practice, I've rarely been bothered by this. Knowing that, for example, multiprocessing can depend heavily on pickling, I stay completely clear of getting anywhere near playing tricks with pickles. The only reason you had "a problem" passing an A()
instance is that you are playing pickle tricks (via overriding the default __getstate__()
).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With