Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python multiprocessing.Process object behaves like it would hold a reference to an object in another process. Why?

import multiprocessing as mp

def delay_one_second(event):
    print 'in SECONDARY process, preparing to wait for 1 second'
    event.wait(1)
    print 'in the SECONDARY process, preparing to raise the event'
    event.set()

if __name__=='__main__':
    evt = mp.Event()
    print 'preparing to wait 10 seconds in the PRIMARY process'
    mp.Process(target = delay_one_second, args=(evt,)).start()
    evt.wait(10)
    print 'PRIMARY process, waking up'

This code (run nicely from inside a module with the "python module.py" command inside cmd.exe) yields a surprising result.

The main process apparently only waits for 1 second before waking up. For this to happen, it means that the secondary process has a reference to an object in the main process.

How can this be? I was expecting to have to use a multiprocessing.Manager(), to share objects between processes, but how is this possible?

I mean the Processes are not threads, they shouldn't use the same memory space. Anyone have any ideas what's going on here?

like image 992
vlad-ardelean Avatar asked Feb 24 '26 02:02

vlad-ardelean


2 Answers

The short answer is that the shared memory is not managed by a separate process; it's managed by the operating system itself.

You can see how this works if you spend some time browsing through the multiprocessing source. You'll see that an Event object uses a Semaphore and a Condition, both of which rely on the locking behavior provided by the SemLock object. This, in turn, wraps a _multiprocessing.SemLock object, which is implemented in c and depends on either sem_open (POSIX) or CreateSemaphore (Windows).

These are c functions that enable access to shared resources that are managed by the operating system itself -- in this case, named semaphores. They can be shared between threads or processes; the OS takes care of everything. What happens is that when a new semaphore is created, it is given a handle. Then, when a new process that needs access to that semaphore is created, it's given a copy of the handle. It then passes that handle to sem_open or CreateSemapohre, and the operating system gives the new process access to the original semaphore.

So the memory is being shared, but it's being shared as part of the operating system's built-in support for synchronization primitives. In other words, in this case, you don't need to open a new process to manage the shared memory; the operating system takes over that task. But this is only possible because Event doesn't need anything more complex than a semaphore to work.

like image 100
senderle Avatar answered Feb 25 '26 16:02

senderle


The documentation says that the multiprocessing module follows the threading API. My guess would be that it uses a mechanism similar to 'fork'. If you fork a thread your OS will create a copy of the current process. It means that it copies the heap and stack, including all your variables and globals and that's what you're seeing.

You can see it for yourself if you pass the function below to a new process.

def print_globals():
    print globals()
like image 29
Michal Avatar answered Feb 25 '26 16:02

Michal