I am trying to load HDF5 data from a memory cache (memcached) or the network, and then query it (read only) from multiple Python processes, without making a separate copy of the whole data set. Intuitively I would like to mmap the image (as it would appear on disk) into the multiple processes, and then query it from Python.
I am finding this difficult to achieve, hence the question. Pointers/corrections appreciated.
File.get_file_image()
which would seem to get the file image. What I don't see how to construct a new File / FileNode from a memory image rather than a disk file.ndarray.__new__(buffer=...)
says it will copy the data, and numpy views can only seem to be constructed from existing ndarrays, not raw buffers.Value
wrapper to help a little). If I use ctypes directly I can read my mmap'd data without issue, but I would lose all the structural information and help from numpy/pandas/pytables to query it.I think the situation should be updated now.
If a disk file is desirable, Numpy now has a standard, dedicated ndarray subclass: numpy.memmap
UPDATE:
After looked into the implementation of multiprocessing.sharedctypes
(CPython 3.6.2 shared memory block allocation code), I found that it always creates tmp files to be mmap
ed, so is not really a file-less solution.
If only pure RAM based sharing is expected, some one has demoed it with multiprocessing.RawArray:
test of shared memory array / numpy integration
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With