Is shared readonly data copied to different processes for multiprocessing?

Tags:

The piece of code that I have looks some what like this:

glbl_array = # a 3 Gb array  def my_func( args, def_param = glbl_array):     #do stuff on args and def_param  if __name__ == '__main__':   pool = Pool(processes=4)   pool.map(my_func, range(1000))

Is there a way to make sure (or encourage) that the different processes does not get a copy of glbl_array but shares it. If there is no way to stop the copy I will go with a memmapped array, but my access patterns are not very regular, so I expect memmapped arrays to be slower. The above seemed like the first thing to try. This is on Linux. I just wanted some advice from Stackoverflow and do not want to annoy the sysadmin. Do you think it will help if the the second parameter is a genuine immutable object like glbl_array.tostring().

625

asked Apr 05 '11 08:04

san

2 Answers

You can use the shared memory stuff from multiprocessing together with Numpy fairly easily:

import multiprocessing import ctypes import numpy as np  shared_array_base = multiprocessing.Array(ctypes.c_double, 10*10) shared_array = np.ctypeslib.as_array(shared_array_base.get_obj()) shared_array = shared_array.reshape(10, 10)  #-- edited 2015-05-01: the assert check below checks the wrong thing #   with recent versions of Numpy/multiprocessing. That no copy is made #   is indicated by the fact that the program prints the output shown below. ## No copy was made ##assert shared_array.base.base is shared_array_base.get_obj()  # Parallel processing def my_func(i, def_param=shared_array):     shared_array[i,:] = i  if __name__ == '__main__':     pool = multiprocessing.Pool(processes=4)     pool.map(my_func, range(10))      print shared_array

which prints

[[ 0.  0.  0.  0.  0.  0.  0.  0.  0.  0.]  [ 1.  1.  1.  1.  1.  1.  1.  1.  1.  1.]  [ 2.  2.  2.  2.  2.  2.  2.  2.  2.  2.]  [ 3.  3.  3.  3.  3.  3.  3.  3.  3.  3.]  [ 4.  4.  4.  4.  4.  4.  4.  4.  4.  4.]  [ 5.  5.  5.  5.  5.  5.  5.  5.  5.  5.]  [ 6.  6.  6.  6.  6.  6.  6.  6.  6.  6.]  [ 7.  7.  7.  7.  7.  7.  7.  7.  7.  7.]  [ 8.  8.  8.  8.  8.  8.  8.  8.  8.  8.]  [ 9.  9.  9.  9.  9.  9.  9.  9.  9.  9.]]

However, Linux has copy-on-write semantics on fork(), so even without using multiprocessing.Array, the data will not be copied unless it is written to.

110

answered Sep 21 '22 01:09

pv.

The following code works on Win7 and Mac (maybe on linux, but not tested).

import multiprocessing import ctypes import numpy as np  #-- edited 2015-05-01: the assert check below checks the wrong thing #   with recent versions of Numpy/multiprocessing. That no copy is made #   is indicated by the fact that the program prints the output shown below. ## No copy was made ##assert shared_array.base.base is shared_array_base.get_obj()  shared_array = None  def init(shared_array_base):     global shared_array     shared_array = np.ctypeslib.as_array(shared_array_base.get_obj())     shared_array = shared_array.reshape(10, 10)  # Parallel processing def my_func(i):     shared_array[i, :] = i  if __name__ == '__main__':     shared_array_base = multiprocessing.Array(ctypes.c_double, 10*10)      pool = multiprocessing.Pool(processes=4, initializer=init, initargs=(shared_array_base,))     pool.map(my_func, range(10))      shared_array = np.ctypeslib.as_array(shared_array_base.get_obj())     shared_array = shared_array.reshape(10, 10)     print shared_array

answered Sep 21 '22 01:09

taku-y

Related questions
                            
                                Python Docstring: raise vs. raises
                            
                                How to hide <matplotlib.lines.Line2D> in IPython notebook
                            
                                reading external sql script in python
                            
                                Setting timezone in Python
                            
                                Dividing Python module into multiple regions
                            
                                Start a flask application in separate thread
                            
                                Django. You don't have permission to edit anything
                            
                                Add an empty column to Spark DataFrame
                            
                                'list' object has no attribute 'shape'
                            
                                Convert commas decimal separators to dots within a Dataframe
                            
                                How to convert XML to JSON in Python? [duplicate]
                            
                                Is there a way to store a function in a list or dictionary so that when the index (or key) is called it fires off the stored function?
                            
                                switching keys and values in a dictionary in python [duplicate]
                            
                                Python interpreter error, x takes no arguments (1 given)
                            
                                Python on IIS: how?
                            
                                Unable to load files using pickle and multiple modules
                            
                                iterating over two values of a list at a time in python [duplicate]
                            
                                Single line for loop over iterator with an "if" filter?
                            
                                What is the best way to iterate over multiple lists at once? [duplicate]
                            
                                How to iterate over the first n elements of a list?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Is shared readonly data copied to different processes for multiprocessing?

Tags:

python

multiprocessing

numpy

san

People also ask

2 Answers

pv.

taku-y

Recent Activity

Donate For Us