I'm working with a bunch of large numpy arrays, and as these started to chew up too much memory lately, I wanted to replace them with numpy.memmap
instances. The problem is, now and then I have to resize the arrays, and I'd preferably do that inplace. This worked quite well with ordinary arrays, but trying that on memmaps complains, that the data might be shared, and even disabling the refcheck does not help.
a = np.arange(10)
a.resize(20)
a
>>> array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0])
a = np.memmap('bla.bin', dtype=int)
a
>>> memmap([0, 0, 0, 0, 0, 0, 0, 0, 0, 0])
a.resize(20, refcheck=False)
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-41-f1546111a7a1> in <module>()
----> 1 a.resize(20, refcheck=False)
ValueError: cannot resize this array: it does not own its data
Resizing the underlying mmap buffer works perfectly fine. The problem is how to reflect these changes to the array object. I've seen this workaround, but unfortunately it doesn't resize the array in place. There is also some numpy documentation about resizing mmaps, but it's clearly not working, at least with version 1.8.0. Any other ideas, how to override the inbuilt resizing checks?
With the help of Numpy numpy. resize(), we can resize the size of an array. Array can be of any shape but to resize it we just need the size i.e (2, 2), (2, 3) and many more.
Just to be clear: there's no "good" way to extend a NumPy array, as NumPy arrays are not expandable. Once the array is defined, the space it occupies in memory, a combination of the number of its elements and the size of each element, is fixed and cannot be changed.
Python arrays do not have a resize method, and neither does lists.
Resize() resize() function is used to create a new array of different sizes and dimensions. resize() can create an array of larger size than the original array.
The issue is that the flag OWNDATA is False when you create your array. You can change that by requiring the flag to be True when you create the array:
>>> a = np.require(np.memmap('bla.bin', dtype=int), requirements=['O'])
>>> a.shape
(10,)
>>> a.flags
C_CONTIGUOUS : True
F_CONTIGUOUS : True
OWNDATA : True
WRITEABLE : True
ALIGNED : True
UPDATEIFCOPY : False
>>> a.resize(20, refcheck=False)
>>> a.shape
(20,)
The only caveat is that it may create the array and make a copy to be sure the requirements are met.
Edit to address saving:
If you want to save the re-sized array to disk, you can save the memmap as a .npy formatted file and open as a numpy.memmap
when you need to re-open it and use as a memmap:
>>> a[9] = 1
>>> np.save('bla.npy',a)
>>> b = np.lib.format.open_memmap('bla.npy', dtype=int, mode='r+')
>>> b
memmap([0, 9, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0])
Edit to offer another method:
You may get close to what you're looking for by re-sizing the base mmap (a.base or a._mmap, stored in uint8 format) and "reloading" the memmap:
>>> a = np.memmap('bla.bin', dtype=int)
>>> a
memmap([0, 0, 0, 0, 0, 0, 0, 0, 0, 0])
>>> a[3] = 7
>>> a
memmap([0, 0, 0, 7, 0, 0, 0, 0, 0, 0])
>>> a.flush()
>>> a = np.memmap('bla.bin', dtype=int)
>>> a
memmap([0, 0, 0, 7, 0, 0, 0, 0, 0, 0])
>>> a.base.resize(20*8)
>>> a.flush()
>>> a = np.memmap('bla.bin', dtype=int)
>>> a
memmap([0, 0, 0, 7, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0])
If I'm not mistaken, this achieves essentially what @wwwslinger's second solution does, but without having to manually specify the size of the new memmap in bits:
In [1]: a = np.memmap('bla.bin', mode='w+', dtype=int, shape=(10,))
In [2]: a[3] = 7
In [3]: a
Out[3]: memmap([0, 0, 0, 7, 0, 0, 0, 0, 0, 0])
In [4]: a.flush()
# this will append to the original file as much as is necessary to satisfy
# the new shape requirement, given the specified dtype
In [5]: new_a = np.memmap('bla.bin', mode='r+', dtype=int, shape=(20,))
In [6]: new_a
Out[6]: memmap([0, 0, 0, 7, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0])
In [7]: a[-1] = 10
In [8]: a
Out[8]: memmap([ 0, 0, 0, 7, 0, 0, 0, 0, 0, 10])
In [9]: a.flush()
In [11]: new_a
Out[11]:
memmap([ 0, 0, 0, 7, 0, 0, 0, 0, 0, 10, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0])
This works well when the new array needs to be bigger than the old one, but I don't think this type of approach will allow for the size of the memory-mapped file to be automatically truncated if the new array is smaller.
Manually resizing the base, as in @wwwslinger's answer, seems to allow the file to be truncated, but it doesn't reduce the size of the array.
For example:
# this creates a memory mapped file of 10 * 8 = 80 bytes
In [1]: a = np.memmap('bla.bin', mode='w+', dtype=int, shape=(10,))
In [2]: a[:] = range(1, 11)
In [3]: a.flush()
In [4]: a
Out[4]: memmap([ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
# now truncate the file to 40 bytes
In [5]: a.base.resize(5*8)
In [6]: a.flush()
# the array still has the same shape, but the truncated part is all zeros
In [7]: a
Out[7]: memmap([1, 2, 3, 4, 5, 0, 0, 0, 0, 0])
In [8]: b = np.memmap('bla.bin', mode='r+', dtype=int, shape=(5,))
# you still need to create a new np.memmap to change the size of the array
In [9]: b
Out[9]: memmap([1, 2, 3, 4, 5])
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With