Is there any way to store an array in an hdf5 file, which is too big to load in memory?
if I do something like this
f = h5py.File('test.hdf5','w')
f['mydata'] = np.zeros(2**32)
I get a memory error.
This is probably due to your chunk layout - the more chunk sizes are small the more your HDF5 file will be bloated. Try to find an optimal balance between chunk sizes (to solve your use-case properly) and the overhead (size-wise) that they introduce in the HDF5 file.
Chunked Storage That's what chunking does in HDF5. It lets you specify the N-dimensional “shape” that best fits your access pattern. When the time comes to write data to disk, HDF5 splits the data into “chunks” of the specified shape, flattens them, and writes them to disk.
HDF5 supports two string encodings: ASCII and UTF-8. We recommend using UTF-8 when creating HDF5 files, and this is what h5py does by default with Python str objects.
Note that no_persist_A. h5 contains 800 bytes of file metadata and nothing else; there is no user data and no free space in the file. The file size of the empty HDF5 file no_persist_A.
According to the documentation, you can use create_dataset
to create a chunked array stored in the hdf5. Example:
>>> import h5py
>>> f = h5py.File('test.h5', 'w')
>>> arr = f.create_dataset('mydata', (2**32,), chunks=True)
>>> arr
<HDF5 dataset "mydata": shape (4294967296,), type "<f4">
Slicing the HDF5 dataset
returns Numpy-arrays.
>>> arr[:10]
array([ 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.], dtype=float32)
>>> type(arr[:10])
numpy.array
You can set values as for a Numpy-array.
>>> arr[3:5] = 3
>>> arr[:6]
array([ 0., 0., 0., 3., 3., 0.], dtype=float32)
I don't know if this is the most efficient way, but you can iterate over the whole array in chunks. And for instance setting it to random values:
>>> import numpy as np
>>> for i in range(0, arr.size, arr.chunks[0]):
arr[i: i+arr.chunks[0]] = np.random.randn(arr.chunks[0])
>>> arr[:5]
array([ 0.62833798, 0.03631227, 2.00691652, -0.16631022, 0.07727782], dtype=float32)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With