Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

load np.memmap without knowing shape

Is it possible to load a numpy.memmap without knowing the shape and still recover the shape of the data?

data = np.arange(12, dtype='float32')
data.resize((3,4))
fp = np.memmap(filename, dtype='float32', mode='w+', shape=(3,4))
fp[:] = data[:]
del fp
newfp = np.memmap(filename, dtype='float32', mode='r', shape=(3,4))

In the last line, I want to be able not to specify the shape and still get the variable newfp to have the shape (3,4), just like it would happen with joblib.load. Is this possible? Thanks.

like image 627
Michael Avatar asked Dec 15 '22 06:12

Michael


2 Answers

Not unless that information has been explicitly stored in the file somewhere. As far as np.memmap is concerned, the file is just a flat buffer.

I would recommend using np.save to persist numpy arrays, since this also preserves the metadata specifying their dimensions, dtypes etc. You can also load an .npy file as a memmap by passing the memmap_mode= parameter to np.load.

joblib.dump uses a combination of pickling to store generic Python objects and np.save to store numpy arrays.


To initialize an empty memory-mapped array backed by a .npy file you can use numpy.lib.format.open_memmap:

import numpy as np
from numpy.lib.format import open_memmap

# initialize an empty 10TB memory-mapped array
x = open_memmap('/tmp/bigarray.npy', mode='w+', dtype=np.ubyte, shape=(10**13,))

You might be surprised by the fact that this succeeds even if the array is larger than the total available disk space (my laptop only has a 500GB SSD, but I just created a 10TB memmap). This is possible because the file that's created is sparse.

Credit for discovering open_memmap should go to kiyo's previous answer here.

like image 142
ali_m Avatar answered Jan 07 '23 07:01

ali_m


The answer from @ali_m is perfectly valid. I would like to mention my personal preference, in case it helps anyone. I always begin my memmap arrays with the shape as the first 2 elements. Doing this is as simple as:

# Writing the memmap array
fp = np.memmap(filename, dtype='float32', mode='w+', shape=(3,4))
fp[:] = data[:]
fp = np.memmap(filename, dtype='float32', mode='r+', shape=(14,))
fp[2:] = fp[:-2]
fp[:2] = [3, 4]
del fp

Or simpler still:

# Writing the memmap array
fp = np.memmap(filename, dtype='float32', mode='w+', shape=(14,))
fp[2:] = data[:]
fp[:2] = [3, 4]
del fp

Then you can easily read the array as:

#reading the memmap array
newfp = np.memmap(filename, dtype='float32', mode='r')
row_size, col_size = newfp[0:2]
newfp = newfp[2:].reshape((row_size, col_size))
like image 20
Rahul Murmuria Avatar answered Jan 07 '23 06:01

Rahul Murmuria