Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Sharing a ctypes numpy array without lock when using multiprocessing

I have a large array (~500k rows x 9 columns) which I would like to share when running a number of parallel processes using Python's multiprocessing module. I am using this SO answer to create my shared array and I understand from this SO answer that the array is locked. However in my case as I never concurrently write to the same row then a lock is superfluous and increases processing time.

When I specify lock=False however I get an error.

My code is this:

shared_array_base = multiprocessing.Array(ctypes.c_double, 90, lock=False)
shared_array = np.ctypeslib.as_array(shared_array_base.get_obj())
shared_array = shared_array.reshape(-1, 9)

And the error is this:

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-15-d89681d70c37> in <module>()
      1 shared_array_base = multiprocessing.Array(ctypes.c_double, len(np.unique(value)) * 9, lock=False)
----> 2 shared_array = np.ctypeslib.as_array(shared_array_base.get_obj())
      3 shared_array = shared_array.reshape(-1, 9)

AttributeError: 'c_double_Array_4314834' object has no attribute 'get_obj'

My question is how can I share a numpy array that is not locked each time I write to it?

like image 682
kungphil Avatar asked Oct 18 '22 04:10

kungphil


1 Answers

Found the answer here thanks to HYRY

Stating lock=True returns a wrapped object:

multiprocessing.sharedctypes.SynchronizedArray

When lock=False returns a raw array which does not have the .get_obj() method

multiprocessing.sharedctypes.c_double_Array_10

Therefore code to create an unlocked array is this:

shared_array_base = multiprocessing.Array(ctypes.c_double, 90, lock=False)
shared_array = np.ctypeslib.as_array(shared_array_base)
shared_array = shared_array.reshape(-1, 9)
like image 109
kungphil Avatar answered Oct 20 '22 21:10

kungphil