Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Fastest way to read a whole HDF5 containing Numpy arrays into memory

I use :

import h5py

f = h5py.File('myfile.h5', 'r')
d = {}
for k in f.iterkeys():
    d[k] = f[k][:]

to read into memory the whole HDF5 file (2 GB, 1000 numpy arrays of 2 MB each).

Is there a faster way to load all the content of the HDF5 into memory ?

(Maybe the loop here does a lot of "move" (seek?) in the file because each f[k] are not placed in the order that will give for k in f.iterkeys() ?)

like image 226
Basj Avatar asked Mar 13 '14 08:03

Basj


1 Answers

PyTables (another Python HDF5 Library) supports loading the whole file to memory using the H5FD_CORE driver. h5py would appear to support memory mapped files as well (see File Drivers). So just do

import h5py
f = h5py.File('myfile.h5', 'r', driver='core')

and you are done, as the file then already resides in memory.

like image 159
Joe Avatar answered Nov 14 '22 23:11

Joe