Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python multiprocessing making same object instance for every process

I have written a simple example to illustrate what exactly I'm banging my head onto. Probably there is some very simple explanaition that I just miss.

import time
import multiprocessing as mp
import os


class SomeOtherClass:
    def __init__(self):
        self.a = 'b'


class SomeProcessor(mp.Process):
    def __init__(self, queue):
        super().__init__()
        self.queue = queue

    def run(self):
        soc = SomeOtherClass()
        print("PID: ", os.getpid())
        print(soc)

if __name__ == "__main__":
    queue = mp.Queue()

    for n in range(10):
        queue.put(n)

    processes = []

    for proc in range(mp.cpu_count()):
        p = SomeProcessor(queue)
        p.start()
        processes.append(p)

    for p in processes:
        p.join()

Result is:

PID: 11853
<__main__.SomeOtherClass object at 0x7fa637d3f588>
PID: 11854
<__main__.SomeOtherClass object at 0x7fa637d3f588>
PID: 11855
<__main__.SomeOtherClass object at 0x7fa637d3f588>
PID: 11856
<__main__.SomeOtherClass object at 0x7fa637d3f588>

Object address is the same for all, regardless every initialization happened in a new process. Can anyone point out what's the problem. Thanks.

Also I wonder about this behaviour, when I first initialize the same object in the main process then cache some values on it and then initialize the same object on every process. Then the processes inherit the main process object.

import time
import multiprocessing as mp
import os
import random

class SomeOtherClass:

    c = {}

    def get(self, a):
        if a in self.c:
            print('Retrieved cached value ...')
            return self.c[a]

        b = random.randint(1,999)

        self.c[a] = b

        return b


class SomeProcessor(mp.Process):
    def __init__(self, queue):
        super().__init__()
        self.queue = queue

    def run(self):
        pid = os.getpid()
        soc = SomeOtherClass()
        val = soc.get('new')
        print("Value from process {0} is {1}".format(pid, val))

if __name__ == "__main__":
    queue = mp.Queue()

    for n in range(10):
        queue.put(n)

    pid = os.getpid()
    soc = SomeOtherClass()
    val = soc.get('new')
    print("Value from main process {0} is {1}".format(pid, val))

    processes = []

    for proc in range(mp.cpu_count()):
        p = SomeProcessor(queue)
        p.start()
        processes.append(p)

    for p in processes:
        p.join()

Output here is :

Value from main process 13052 is 676
Retrieved cached value ...
Value from process 13054 is 676
Retrieved cached value ...
Value from process 13056 is 676
Retrieved cached value ...
Value from process 13057 is 676
Retrieved cached value ...
Value from process 13055 is 676
like image 461
Mario Kirov Avatar asked Sep 07 '21 11:09

Mario Kirov


People also ask

Does Python multiprocessing shared memory?

shared_memory — Shared memory for direct access across processes. New in version 3.8. This module provides a class, SharedMemory , for the allocation and management of shared memory to be accessed by one or more processes on a multicore or symmetric multiprocessor (SMP) machine.

What is a Daemonic process Python?

Daemon processes in Python Python multiprocessing module allows us to have daemon processes through its daemonic option. Daemon processes or the processes that are running in the background follow similar concept as the daemon threads. To execute the process in the background, we need to set the daemonic flag to true.

Does multiprocessing make Python faster?

In multiprocessing , multiple Python processes are created and used to execute a function instead of multiple threads, bypassing the Global Interpreter Lock (GIL) that can significantly slow down threaded Python programs.

How does multiprocessing lock work in Python?

Python provides a mutual exclusion lock for use with processes via the multiprocessing. Lock class. An instance of the lock can be created and then acquired by processes before accessing a critical section, and released after the critical section. Only one process can have the lock at any time.


2 Answers

To expand on the comments and discussion:

  • On Linux, multiprocessing defaults to the fork start method. Forking a process means child processes will share a copy-on-write version of the parent process's data. This is why the globally created objects have the same address in the subprocesses.
    • On macOS and Windows, the default start method is spawn – no objects are shared in that case.
  • The subprocesses will have their unique copies of the objects as soon as they write to them (and internally in CPython, in fact, when they even read them, due to the reference counter being in the object header).
  • A variable defined as
    class SomeClass:
        container = {}
    
    is class-level, not instance-level and will be shared between all instances of SomeClass. That is,
    a = SomeClass()
    b = SomeClass()
    print(a is b)  # False
    print(a.container is b.container is SomeClass.container)  # True
    a.container["x"] = True
    print("x" in b.container)  # True
    print("x" in SomeClass.container)  # True
    
    By virtue of the class's state being forked into the subprocess, the shared container also seems shared. However, writing into the container in a subprocess will not appear in the parent or sibling processes. Only certain special multiprocessing types (and certain lower-level primitives) can span process boundaries.
  • To correctly separate that container between instances and processes, it will need to be instance-level:
    class SomeClass:
        def __init__(self):
            self.container = {}
    
    (However, of course, if a SomeClass is globally instantiated, and a process is forked, its state at the time of the fork will be available in subprocesses.)
like image 74
AKX Avatar answered Nov 15 '22 00:11

AKX


tldr: They're actually not the same instance, so don't worry about that.

Well that's interesting. Their memory reference is exactly the same, but the instances are definitely different. If we modify the code like this:

import time
import multiprocessing as mp
import os


class SomeOtherClass:
    def __init__(self, num):
        self.a = num  # <-- Let's identify the instance with the pid
    
    def __str__(self):
        return f"I'm number {self.a}"  # <-- Better representation of the instance


class SomeProcessor(mp.Process):
    def __init__(self, queue):
        super().__init__()
        self.queue = queue

    def run(self):
        soc = SomeOtherClass(os.getpid())  <-- Use the PID to instantiate different objects
        print("PID: ", os.getpid())
        print(soc)
        time.sleep(1)
        print(soc)  # <-- Give it a second and print again

if __name__ == "__main__":
    queue = mp.Queue()

    for n in range(10):
        queue.put(n)

    processes = []

    for proc in range(mp.cpu_count()):
        p = SomeProcessor(queue)
        p.start()
        processes.append(p)

    for p in processes:
        p.join()

We can see that the instances are definitely different, and they aren't being modified, because after the time.sleep() they still have their attributes unchanged:

PID:  668424
I'm number 668424
PID:  668425
I'm number 668425
PID:  668426
I'm number 668426
...
I'm number 668435
I'm number 668424
I'm number 668426
...

Yet, if we remove the __str__ function, I still see the same memory reference:

<__main__.SomeOtherClass object at 0x7f3e08d83bb0>
PID:  669008
<__main__.SomeOtherClass object at 0x7f3e08d83bb0>
PID:  669009
<__main__.SomeOtherClass object at 0x7f3e08d83bb0>
PID:  669010
...
<__main__.SomeOtherClass object at 0x7f3e08d83bb0>
<__main__.SomeOtherClass object at 0x7f3e08d83bb0>
<__main__.SomeOtherClass object at 0x7f3e08d83bb0>
...

To be honest, I don't really know the reason why this happens, so other people could help you more. As the user Booboo has said, you're seeing this because of the fact that Linux uses fork to start a new process. I did run this in a Linux machine too. If Windows had been used, the memory reference would be different.

like image 43
Shinra tensei Avatar answered Nov 15 '22 01:11

Shinra tensei