I have a large array (~500k rows x 9 columns) which I would like to share when running a number of parallel processes using Python's multiprocessing
module. I am using this SO answer to create my shared array and I understand from this SO answer that the array is locked. However in my case as I never concurrently write to the same row then a lock is superfluous and increases processing time.
When I specify lock=False
however I get an error.
My code is this:
shared_array_base = multiprocessing.Array(ctypes.c_double, 90, lock=False)
shared_array = np.ctypeslib.as_array(shared_array_base.get_obj())
shared_array = shared_array.reshape(-1, 9)
And the error is this:
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-15-d89681d70c37> in <module>()
1 shared_array_base = multiprocessing.Array(ctypes.c_double, len(np.unique(value)) * 9, lock=False)
----> 2 shared_array = np.ctypeslib.as_array(shared_array_base.get_obj())
3 shared_array = shared_array.reshape(-1, 9)
AttributeError: 'c_double_Array_4314834' object has no attribute 'get_obj'
My question is how can I share a numpy array that is not locked each time I write to it?
Found the answer here thanks to HYRY
Stating lock=True
returns a wrapped object:
multiprocessing.sharedctypes.SynchronizedArray
When lock=False
returns a raw array which does not have the .get_obj()
method
multiprocessing.sharedctypes.c_double_Array_10
Therefore code to create an unlocked array is this:
shared_array_base = multiprocessing.Array(ctypes.c_double, 90, lock=False)
shared_array = np.ctypeslib.as_array(shared_array_base)
shared_array = shared_array.reshape(-1, 9)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With