I'm trying to increase cache size for my HDF5 files, but it doesn't seem to be working. This is what I have:
import h5py
with h5py.File("test.h5", 'w') as fid:
# cache settings of file
cacheSettings = list(fid.id.get_access_plist().get_cache())
print cacheSettings
# increase cache
cacheSettings[2] = int(5 * cacheSettings[2])
print cacheSettings
# read cache settings from file
fid.id.get_access_plist().set_cache(*cacheSettings)
print fid.id.get_access_plist().get_cache()
Here is the output:
[0, 521, 1048576, 0.75]
[0, 521, 5242880, 0.75]
(0, 521, 1048576, 0.75)
Any idea why reading works, but setting doesn't?
Closing and reopening the file doesn't seem to help either.
If you are using h5py version 2.9.0 or newer, see Mike's answer.
According to the docs, get_access_plist()
returns a copy of the file access property list. So it is not surprising that modifying the copy does not affect the original.
It appears the high-level interface does not provide a way to change the cache settings.
Here is how you could do it using the low-level interface.
propfaid = h5py.h5p.create(h5py.h5p.FILE_ACCESS)
settings = list(propfaid.get_cache())
print(settings)
# [0, 521, 1048576, 0.75]
settings[2] *= 5
propfaid.set_cache(*settings)
settings = propfaid.get_cache()
print(settings)
# (0, 521, 5242880, 0.75)
The above creates a PropFAID. We can then open the file and get a FileID this way:
import contextlib
with contextlib.closing(h5py.h5f.open(
filename, flags=h5py.h5f.ACC_RDWR, fapl=propfaid)) as fid:
# <h5py.h5f.FileID object at 0x9abc694>
settings = list(fid.get_access_plist().get_cache())
print(settings)
# [0, 521, 5242880, 0.75]
And we can use the fid
to open the file with the high-level interface by passing fid
to h5py.File
:
f = h5py.File(fid)
print(f.id.get_access_plist().get_cache())
# (0, 521, 5242880, 0.75)
Thus, you can still use the high-level interface, but it takes some fiddling to get there. On the other hand, if you distill it to just the essentials, perhaps it isn't so bad:
import h5py
import contextlib
filename = '/tmp/foo.hdf5'
propfaid = h5py.h5p.create(h5py.h5p.FILE_ACCESS)
settings = list(propfaid.get_cache())
settings[2] *= 5
propfaid.set_cache(*settings)
with contextlib.closing(h5py.h5f.open(filename, fapl=propfaid)) as fid:
f = h5py.File(fid)
As of h5py version 2.9.0, this behavior is now available directly through the main h5py.File
interface. There are three parameters that control the "raw data chunk cache" — rdcc_nbytes
, rdcc_w0
, and rdcc_nslots
— which are documented here. The OP was trying to adjust the rdcc_nbytes
setting, which can now simply be done as
import h5py
with h5py.File("test.h5", "w", rdcc_nbytes=5242880) as f:
f.create_dataset(...)
In this case, I assume that you know how much space you actually need, rather than just multiplying by 5 as the OP wanted. The current default values are the same as the OP found. Of course, if you really wanted to do this programatically, you could just open it once, get the cache, close it, and then reopen with the desired parameters.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With