prefetch numpy arrays from disk?

Question

I'm working with a bunch of numpy arrays that don't all fit in RAM, so I need to periodically save them to and load them from the disk.

Usually, I know which ones I'll need to read ahead of time, so I'd like to hide the latency by issuing something like a "prefetch" instruction in advance.

How should I do this?

(There is a similar question related to TensorFlow: However, I am not using TensorFlow, and so I wouldn't want to create a dependency on it)

ShadowRanger · Accepted Answer

If you're using Python 3.3+ on a UNIX-like system, you can use os.posix_fadvise to initiate a prefetch after opening a file. For example:

with open(filepath, 'rb') as f:
    os.posix_fadvise(f.fileno(), 0, os.stat(f.fileno()).st_size, os.POSIX_FADV_WILLNEED)

    ... do other stuff ...

    # If you're lucky, OS has asynchronously prefetched file contents
    stuff = pickle.load(f)

Aside from that, Python doesn't directly offer any APIs for explicit prefetch, but you could use ctypes to manually load an OS appropriate prefetch function, or use a background thread that does nothing but read and discard blocks from the file to improve the odds that the data is in the system cache.

prefetch numpy arrays from disk?

Tags:

python

numpy

python-2.7

pickle

shelve

MWB

1 Answers

ShadowRanger

Recent Activity

Donate For Us

prefetch numpy arrays from disk?

Tags:

python

numpy

python-2.7

pickle

shelve

MWB

1 Answers

ShadowRanger

Related questions

Recent Activity

Donate For Us