Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Object identity through multiprocess spawn

I have been experimenting with the multiprocessing module in python, and I was wondering how the arguments of different parallelization methods are treated through the spawned processes. Here the code I used:

import os
import time
import multiprocessing


class StateClass:
    def __init__(self):
        self.state = 0

    def __call__(self):
        return f"I am {id(self)}: {self.state}"


CONTEXT = multiprocessing.get_context("fork")

nb_workers = 2

stato = StateClass()


def wrapped_work_function(a1, a2, sss, qqq):
    time.sleep(a1 + 1)
    if a1 == 0:
        sss.state = 0
    else:
        sss.state = 123
    for eee in a2:
        time.sleep(a1 + 1)
        sss.state += eee
        print(
            f"Worker {a1} in process {os.getpid()} (parent process {os.getppid()}): {eee}, {sss()}"
        )
    return sss


print("main", id(stato), stato)

manager = CONTEXT.Manager()
master_workers_queue = manager.Queue()

work_args_list = [
    (
        worker_index,
        [iii for iii in range(4)],
        stato,
        master_workers_queue,
    )
    for worker_index in range(nb_workers)
]

pool = CONTEXT.Pool(nb_workers)
result = pool.starmap_async(wrapped_work_function, work_args_list)

pool.close()
pool.join()
print("Finish")
bullo = result.get(timeout=100)
bullo.append(stato)
for sss in bullo:
    print(sss, id(sss), sss.state)

from which I get for example the following output:

main 140349939506416 <__main__.StateClass object at 0x7fa5c449dcf0>
Worker 0 in process 9075 (parent process 9047): 0, I am 140350069832528: 0
Worker 0 in process 9075 (parent process 9047): 1, I am 140350069832528: 1
Worker 1 in process 9077 (parent process 9047): 0, I am 140350069832528: 123
Worker 0 in process 9075 (parent process 9047): 2, I am 140350069832528: 3
Worker 0 in process 9075 (parent process 9047): 3, I am 140350069832528: 6
Worker 1 in process 9077 (parent process 9047): 1, I am 140350069832528: 124
Worker 1 in process 9077 (parent process 9047): 2, I am 140350069832528: 126
Worker 1 in process 9077 (parent process 9047): 3, I am 140350069832528: 129
Finish
<__main__.StateClass object at 0x7fa5c43ac190> 140349938516368 6
<__main__.StateClass object at 0x7fa5c43ac4c0> 140349938517184 129
<__main__.StateClass object at 0x7fa5c449dcf0> 140349939506416 0

The initial class instance stato has id 140349939506416, and keeps it through its lifetime as I would expect. Within the starmap_async method I get indeed two different instances of the same class (one for each worker/process), which I can modify and which retain their state property until the end of the script. Anyway the id of these instances is initially the same (140350069832528), and at the end of the script both of them have yet another id, which is also different from the one of the original instance. Having the same id doesn´t mean that they have the same address in memory? How is it then possible that they retain a different state? Is this behavior related to the fork context?

like image 317
Neo Avatar asked Dec 04 '25 04:12

Neo


1 Answers

First of all when I run this (Debian Linux, Python 3.9.7) I do not find that the ids of the sss instances are the same for both subprocesses:

main 140614771273680 <__main__.StateClass object at 0x7fe36d7defd0>
Worker 0 in process 19 (parent process 13): 0, I am 140614770671776: 0
Worker 0 in process 19 (parent process 13): 1, I am 140614770671776: 1
Worker 1 in process 20 (parent process 13): 0, I am 140614761373648: 123
Worker 0 in process 19 (parent process 13): 2, I am 140614770671776: 3
Worker 0 in process 19 (parent process 13): 3, I am 140614770671776: 6
Worker 1 in process 20 (parent process 13): 1, I am 140614761373648: 124
Worker 1 in process 20 (parent process 13): 2, I am 140614761373648: 126
Worker 1 in process 20 (parent process 13): 3, I am 140614761373648: 129
Finish
<__main__.StateClass object at 0x7fe36ce7b7f0> 140614761428976 6
<__main__.StateClass object at 0x7fe36ce7b520> 140614761428256 129
<__main__.StateClass object at 0x7fe36d7defd0> 140614771273680 0

Even though you are forking the new processes, the stato instances within the work_args_list list are being passed to your worker function as sss. Argument passing to a pool worker function, which is running in a different process/address space, is accomplished by pickle, which serializes and then de-serializes the instance thus making a copy that will in general have a different id when it gets de-serialized. In this particular case each process inherits global variable stato when using the fork method and this should have the same id in all processes/address spaces. We can verify this if we modify wrapped_work_function to print out the id of stato thus:

def wrapped_work_function(a1, a2, sss, qqq):
    print('The id of the inherited stato is', id(stato))
    time.sleep(a1 + 1)
    if a1 == 0:
        sss.state = 0
    else:
        sss.state = 123
    for eee in a2:
        time.sleep(a1 + 1)
        sss.state += eee
        print(
            f"Worker {a1} in process {os.getpid()} (parent process {os.getppid()}): {eee}, {sss()}"
        )
    return sss

Then the printout is:

main 140456701534160 <__main__.StateClass object at 0x7fbe9fcd1fd0>
The id of the inherited stato is 140456701534160
The id of the inherited stato is 140456701534160
Worker 0 in process 43 (parent process 37): 0, I am 140456700920112: 0
Worker 0 in process 43 (parent process 37): 1, I am 140456700920112: 1
Worker 1 in process 44 (parent process 37): 0, I am 140456700920112: 123
Worker 0 in process 43 (parent process 37): 2, I am 140456700920112: 3
Worker 0 in process 43 (parent process 37): 3, I am 140456700920112: 6
Worker 1 in process 44 (parent process 37): 1, I am 140456700920112: 124
Worker 1 in process 44 (parent process 37): 2, I am 140456700920112: 126
Worker 1 in process 44 (parent process 37): 3, I am 140456700920112: 129
Finish
<__main__.StateClass object at 0x7fbe9f36e880> 140456691689600 6
<__main__.StateClass object at 0x7fbe9f36eb20> 140456691690272 129
<__main__.StateClass object at 0x7fbe9fcd1fd0> 140456701534160 0

All address spaces see the same id for stato, namely 140456701534160. If every address space sees the same id for inherited stato, then the ids for sss, which should be separate copies of stato, cannot have the same ids as stato. When I run the code they have different ids as I would expect. But each sss running in a different address space could have the same id as one another, but this is not guaranteed (on this second run they were the same).

But even if the sss instances have the same id and the same address, these are two instances existing in two different processes and thus two different address spaces. That is why they can maintain distinct states. As an aside, when your worker function returns sss, it is being passed back to the main process using pickle, which serializes and de-serializes the instance thus in effect making a copy of the original. That is why you see the returned ids different.

As an additional aside: You have bullo = result.get(timeout=100) testing for a possible timeout. But this statement is preceded by calls to pool.close() and pool.join(). These two calls will wait for all submitted tasks to complete. Thus when you call result.get, the task is guaranteed to have completed and could never cause a timeout exception.

like image 180
Booboo Avatar answered Dec 05 '25 23:12

Booboo