Large numpy arrays in shared memory for multiprocessing: Is something wrong with this approach?

Tags:

Multiprocessing is a wonderful tool but is not so straight forward to use large memory chunks with it. You can load chunks in each process and dump results on disk but sometimes you need to store the results in the memory. And on top, use the fancy numpy functionality.

I have read/googled a lot and came up with some answers:

Use numpy array in shared memory for multiprocessing

Share Large, Read-Only Numpy Array Between Multiprocessing Processes

Python multiprocessing global numpy arrays

How do I pass large numpy arrays between python subprocesses without saving to disk?

Etc etc etc.

They all have drawbacks: Not-so-mainstream libraries (sharedmem); globally storing variables; not so easy to read code, pipes, etc etc.

My goal was to seamlessly use numpy in my workers without worrying about conversions and stuff.

After much trials I came up with this. And it works on my ubuntu 16, python 3.6, 16GB, 8 core machine. I did a lot of "shortcuts" compared to previous approaches. No global shared state, no pure memory pointers that need to be converted to numpy inside workers, large numpy arrays passed as process arguments, etc.

Pastebin link above, but I will put few snippets here.

Some imports:

import numpy as np
import multiprocessing as mp
import multiprocessing.sharedctypes
import ctypes

Allocate some shared mem and wrap it into an numpy array:

def create_np_shared_array(shape, dtype, ctype)
     . . . . 
    shared_mem_chunck = mp.sharedctypes.RawArray(ctype, size)
    numpy_array_view = np.frombuffer(shared_mem_chunck, dtype).reshape(shape)
    return numpy_array_view

Create shared array and put something in it

src = np.random.rand(*SHAPE).astype(np.float32)
src_shared = create_np_shared_array(SHAPE,np.float32,ctypes.c_float)
dst_shared = create_np_shared_array(SHAPE,np.float32,ctypes.c_float)
src_shared[:] = src[:]  # Some numpy ops accept an 'out' array where to store the results

Spawn the process:

p = mp.Process(target=lengthly_operation,args=(src_shared, dst_shared, k, k + STEP))
p.start()
p.join()

Here are some results (see pastebin code for full reference):

Serial version: allocate mem 2.3741257190704346 exec: 17.092209577560425 total: 19.46633529663086 Succes: True
Parallel with trivial np: allocate mem 2.4535582065582275 spawn  process: 0.00015354156494140625 exec: 3.4581971168518066 total: 5.911908864974976 Succes: False
Parallel with shared mem np: allocate mem 4.535916328430176 (pure alloc:4.014216661453247 copy: 0.5216996669769287) spawn process: 0.00015664100646972656 exec: 3.6783478260040283 total: 8.214420795440674 Succes: True

I also did a cProfile (why 2 extra seconds when allocating shared mem?) and realized that there are some calls to the tempfile.py, {method 'write' of '_io.BufferedWriter' objects}.

Questions

Am I doing something wrong?
Are the (large) arrays pickled back and forth and I didn't gain anything wrt to speed? Note that the 2nd run (using regular np arrays fail the correctness test)
Is there a way to further improve timings, code clarity, etc? (wrt to multiprocessing paradigm)

Notes

I can't work with process pools because the mem has to be inherited at the fork and not sent as a parameter.

687

asked Oct 18 '17 13:10

visoft

1 Answers

Allocation of the shared array is slow, because apparently it's written to disk first, so it can be shared through a mmap. For reference see heap.py and sharedctypes.py. This is why tempfile.py shows up in the profiler. I think the advantage of this approach is that the shared memory is cleaned up in case of a crash, and this cannot be guaranteed with POSIX shared memory.

There is no pickling happening with your code, thanks to fork and, as you said, the memory is inherited. The reason the 2nd run doesn't work is because the child processes are not allowed to write in the memory of the parent. Instead, private pages are allocated on the fly, only to be discared when the child process ends.

I only have one suggestion: You don't have to specify a ctype yourself, the right type can be figured out from the numpy dtype through np.ctypeslib._typecodes. Or just use c_byte for everything and use the dtype itemsize to figure out the size of the buffer, it will be casted by numpy anyway.

answered Oct 16 '22 22:10

user7138814

Related questions
                            
                                PyInstaller: how to create multiple programs in one folder?
                            
                                PyQt - Make QAction checkable even if it is disabled
                            
                                Configure net.core.somaxconn for Nodes on GKE
                            
                                How to fully delete a turtle
                            
                                Is there a DropConnect layer in Keras? [closed]
                            
                                Pandas DataFrame slow to show shape or dtypes
                            
                                Correctly loading Keras model in Django that supports multi-tenancy
                            
                                Surprising behaviour of Python date and timedelta subtraction
                            
                                trying to make OpenCV 3.2.0 work with virtualenv
                            
                                Python 3 -- Module not found
                            
                                Matplotlib -- libpng error: Incompatible libpng version in application and library
                            
                                Shorten large stack traces when using libraries
                            
                                Python Logging Error
                            
                                keras validation_data with multiple input
                            
                                How to test that a custom excepthook is installed correctly?
                            
                                Pandas: Apply function to each pair of columns
                            
                                Type-checking Pandas DataFrames
                            
                                How to install atari-py in Windows 10? [duplicate]
                            
                                Import data frame from one Jupyter Notebook file to another
                            
                                python3 default encoding UnicodeDecodeError ascii using apache WSGI

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Large numpy arrays in shared memory for multiprocessing: Is something wrong with this approach?

Tags:

python

multiprocessing

numpy

visoft

People also ask

1 Answers

user7138814

Recent Activity

Donate For Us