Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python multiprocessing: sharing data between processes

I am trying to use multiprocessing for the first time and having some fairly basic issues. I have a toy example below, where two processes are adding data to a list:

def add_process(all_nums_class, numbers_to_add):
    for number in numbers_to_add:
        all_nums_class.all_nums_list.append(number)

class AllNumsClass:
    def __init__(self):
        self.all_nums_list = []

all_nums_class = AllNumsClass()

p1 = Process(target=add_process, args=(all_nums_class, [1,3,5]))
p1.start()

p2 = Process(target=add_process, args=(all_nums_class, [2,4,6]))
p2.start()

all_nums_class.all_nums_list

I'd like to have the all_nums_class shared between these processes so that they can both add to its all_nums_list - so the result should be

[1,2,3,4,5,6]

instead of what I'm currently getting which is just good old

[]

Could anybody please advise? I have played around with namespace a bit but I haven't yet made it work here.

I feel I'd better mention (in case it makes a difference) that I'm doing this on Jupyter notebook.

like image 244
Chris J Harris Avatar asked Oct 17 '25 16:10

Chris J Harris


1 Answers

You can either use a multiprocessing Queue or a Pipe to share data between processes. Queues are both thread and process safe. You will have to be more careful when using a Pipe as the data in a pipe may become corrupted if two processes (or threads) try to read from or write to the same end of the pipe at the same time. Of course there is no risk of corruption from processes using different ends of the pipe at the same time.

Currently, your implementation spawns two separate processes each with its own self.all_nums_list. So you're essentially spawning three objects of AllNumsClass: One in your main program, one in p1, and one in p2. Since processes are independent and don't share the same memory space, they are appending correctly but its appending to its own self.all_nums_list for each process. That's why when you print all_nums_class.all_nums_list in your main program, you're printing the main processes' self.all_nums_list which is a empty list. To share the data and have the processes append to the same list, I would recommend using a Queue.

Example using Queue and Process

import multiprocessing as mp

def add_process(queue, numbers_to_add):
    for number in numbers_to_add:
        queue.put(number)

class AllNumsClass:
    def __init__(self):
        self.queue = mp.Queue()
    def get_queue(self):
        return self.queue

if __name__ == '__main__':
    
    all_nums_class = AllNumsClass()

    processes = []
    p1 = mp.Process(target=add_process, args=(all_nums_class.get_queue(), [1,3,5]))
    p2 = mp.Process(target=add_process, args=(all_nums_class.get_queue(), [2,4,6]))

    processes.append(p1)
    processes.append(p2)
    for p in processes:
        p.start()
    for p in processes:
        p.join()

    output = [] 
    while all_nums_class.get_queue().qsize() > 0:
        output.append(all_nums_class.get_queue().get())
    print(output)

This implementation is asynchronous as it does not apply in sequential order. Every time you run it, you may get different outputs.

Example outputs

[1, 2, 3, 5, 4, 6]

[1, 3, 5, 2, 4, 6]

[2, 4, 6, 1, 3, 5]

[2, 1, 4, 3, 5, 6]

A simpler way to maintain an ordered or unordered list of results is to use the mp.Pool class. Specifically, the Pool.apply and the Pool.apply_async functions. Pool.apply will lock the main program until all processes are finished, which is quite useful if we want to obtain results in a particular order for certain applications. In contrast, Pool.apply_async will submit all processes at once and retrieve the results as soon as they are finished. An additional difference is that we need to use the get method after the Pool.apply_async call in order to obtain the return values of the finished processes.

like image 69
nathancy Avatar answered Oct 19 '25 06:10

nathancy