Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Share a list between different processes?

I have the following problem. I have written a function that takes a list as input and creates a dictionary for each element in the list. I then want to append this dictionary to a new list, so I get a list of dictionaries. I am trying to spawn multiple processes for this. My problem here is that I want the different processes to access the list of dictionaries as it is updated by other processes, for example to print something once the has reached a certain length.

My example would be like this:

import multiprocessing

list=['A', 'B', 'C', 'D', 'E', 'F']

def do_stuff(element):
    element_dict={}
    element_dict['name']=element
    new_list=[]
    new_list.append(element_dict)
    if len(new_list)>3:
        print 'list > 3'

###Main###
pool=multiprocessing.Pool(processes=6)
pool.map(do_stuff, list)
pool.close()

Right now my problem is that each process creates its own new_list. Is there a way to share the list between processes, such that all dictionaries are appended to the same list? Or is the only way to define the new_list outside of the function?

like image 493
sequence_hard Avatar asked Nov 16 '16 10:11

sequence_hard


People also ask

How to communicate between processes in Python?

A simple way to communicate between process with multiprocessing is to use a Queue to pass messages back and forth. Any pickle-able object can pass through a Queue. This short example only passes a single message to a single worker, then the main process waits for the worker to finish.

What is multiprocessing Queue?

The multiprocessing. Queue provides a way to allow these producer and consumer processes to communicate data with each other. First, we can define the function to be executed by the producer process. The task will iterate ten times in a loop.


2 Answers

One way is to use a manager object and create your shared list object from it:

from multiprocessing import Manager, Pool

input_list = ['A', 'B', 'C', 'D', 'E', 'F']

manager = Manager()
shared_list = manager.list()

def do_stuff(element):
    element_dict = {}
    element_dict['name'] = element
    shared_list.append(element_dict)
    if len(shared_list) > 3:
        print('list > 3')

pool = Pool(processes=6)
pool.map(do_stuff, input_list)
pool.close()

Remember, unlike threads, processes do not share memory space. (When spawned, each process gets its own copy of the memory footprint of the spawning process, and then runs with it.) So they can only communicate via some form of IPC (interprocess communication). In Python, one such method is multiprocessing.Manager and the data structures it exposes, e.g. list or dict. These are used in code as easily as their built-in equivalents, but under the hood utilize some form of IPC (sockets probably).

Edit Feb 1, 2022: Removed unneeded global shared_list declaration from the function, since the object is not being replaced.

like image 117
Velimir Mlaker Avatar answered Sep 22 '22 06:09

Velimir Mlaker


the following is from python documentation:

from multiprocessing import shared_memory
a = shared_memory.ShareableList(['howdy', b'HoWdY', -273.154, 100, None, True, 42])
[ type(entry) for entry in a ]
[<class 'str'>, <class 'bytes'>, <class 'float'>, <class 'int'>, <class 'NoneType'>, <class 'bool'>, <class 'int'>]
a[2]
-273.154
a[2] = -78.5
a[2]
-78.5
a[2] = 'dry ice'  # Changing data types is supported as well
a[2]
'dry ice'
a[2] = 'larger than previously allocated storage space'
Traceback (most recent call last):
  ...
ValueError: exceeds available storage for existing str
a[2]
'dry ice'
len(a)
7
a.index(42)
6
a.count(b'howdy')
0
a.count(b'HoWdY')
1
a.shm.close()
a.shm.unlink()
del a  # Use of a ShareableList after call to unlink() is unsupported
like image 36
Takintayo Akinbiyi Avatar answered Sep 23 '22 06:09

Takintayo Akinbiyi