Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

In Python, is it possible to overload Numpy's memmap to delete itself when the memmap object is no longer referenced?

Tags:

python

numpy

I am trying to use memmap when certain data doesn't fit in memory and employ memmap's ability to trick code into thinking it's just an ndarray. To further expand on this way of using memmap I was wondering if it would be possible to overload memmap's dereference operator to delete the memmap file.

So for example:

from tempfile import mkdtemp
import os.path as path
filename = path.join(mkdtemp(), 'tmpfile.dat')
{
    out = np.memmap(filename, dtype=a.dtype, mode='w+', shape=a.shape)
}
# At this point out is out of scope, so the overloaded 
# dereference function would delete tmpfile.dat

Does this sound feasible/has this been done? Is there something I am not thinking of?

Thank you!

like image 889
Philipp Cannons Avatar asked Sep 16 '25 04:09

Philipp Cannons


2 Answers

just delete the file after it has been opened by np.memmap the file will then be deleted by the system after the last reference to the file descriptor is closed.

python temporary files work like this and can very conveniently be used with the with context manger construct:

with tempfile.NamedTemporaryFile() as f:
    # file gone now from the filesystem 
    # but f still holds a reference so it still exists and uses space (see /prof<pid>/fd)
    # open it again (will not work on windows)
    x = np.memmap(f.name, dtype=np.float64, mode='w+', shape=(3,4))
# file path is unlinked but exists on disk with the open file reference in x
del x
# now all references are gone and the file is properly deleted
like image 70
jtaylor Avatar answered Sep 19 '25 07:09

jtaylor


A case if we do not want to use with and just have some class that handles it for us:

class tempmap(np.memmap):
    """
    Extension of numpy memmap to automatically map to a file stored in temporary directory.
    Usefull as a fast storage option when numpy arrays become large and we just want to do some quick experimental stuff.
    """
    def __new__(subtype, dtype=np.uint8, mode='w+', offset=0,
                shape=None, order='C'):
        ntf = tempfile.NamedTemporaryFile()
        self = np.memmap.__new__(subtype, ntf, dtype, mode, offset, shape, order)
        self.temp_file_obj = ntf
        return self

    def __del__(self):
        if hasattr(self,'temp_file_obj') and self.temp_file_obj is not None:
            self.temp_file_obj.close()
            del self.temp_file_obj

def np_as_tmp_map(nparray):
    tmpmap = tempmap(dtype=nparray.dtype, mode='w+', shape=nparray.shape)
    tmpmap[...] = nparray
    return tmpmap


def test_memmap():
    """Test, that deleting a temp memmap also deletes the file."""
    x = np_as_tmp_map(np.zeros(10, 10), np.float))
    name = copy(x.temp_file_obj.name)
    del x
    x = None
    assert not os.path.isfile(name)
like image 36
Holi Avatar answered Sep 19 '25 07:09

Holi