Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to set cache settings while using h5py high level interface?

Tags:

python

h5py

I'm trying to increase cache size for my HDF5 files, but it doesn't seem to be working. This is what I have:

import h5py

with h5py.File("test.h5", 'w') as fid:
        # cache settings of file
        cacheSettings = list(fid.id.get_access_plist().get_cache())
        print cacheSettings
        # increase cache
        cacheSettings[2] = int(5 * cacheSettings[2])
        print cacheSettings
        # read cache settings from file
        fid.id.get_access_plist().set_cache(*cacheSettings)
        print fid.id.get_access_plist().get_cache()

Here is the output:

[0, 521, 1048576, 0.75]
[0, 521, 5242880, 0.75]
(0, 521, 1048576, 0.75)

Any idea why reading works, but setting doesn't?
Closing and reopening the file doesn't seem to help either.

like image 216
Enno Gröper Avatar asked Feb 01 '13 19:02

Enno Gröper


2 Answers

If you are using h5py version 2.9.0 or newer, see Mike's answer.


According to the docs, get_access_plist() returns a copy of the file access property list. So it is not surprising that modifying the copy does not affect the original.

It appears the high-level interface does not provide a way to change the cache settings.

Here is how you could do it using the low-level interface.

propfaid = h5py.h5p.create(h5py.h5p.FILE_ACCESS)
settings = list(propfaid.get_cache())
print(settings)
# [0, 521, 1048576, 0.75]

settings[2] *= 5
propfaid.set_cache(*settings)
settings = propfaid.get_cache()
print(settings)
# (0, 521, 5242880, 0.75)

The above creates a PropFAID. We can then open the file and get a FileID this way:

import contextlib
with contextlib.closing(h5py.h5f.open(
                        filename, flags=h5py.h5f.ACC_RDWR, fapl=propfaid)) as fid:
    # <h5py.h5f.FileID object at 0x9abc694>
    settings = list(fid.get_access_plist().get_cache())
    print(settings)
    # [0, 521, 5242880, 0.75]

And we can use the fid to open the file with the high-level interface by passing fid to h5py.File:

    f = h5py.File(fid)
    print(f.id.get_access_plist().get_cache())
    # (0, 521, 5242880, 0.75)

Thus, you can still use the high-level interface, but it takes some fiddling to get there. On the other hand, if you distill it to just the essentials, perhaps it isn't so bad:

import h5py
import contextlib

filename = '/tmp/foo.hdf5'
propfaid = h5py.h5p.create(h5py.h5p.FILE_ACCESS)
settings = list(propfaid.get_cache())
settings[2] *= 5
propfaid.set_cache(*settings)
with contextlib.closing(h5py.h5f.open(filename, fapl=propfaid)) as fid:
    f = h5py.File(fid)
like image 112
unutbu Avatar answered Sep 20 '22 16:09

unutbu


As of h5py version 2.9.0, this behavior is now available directly through the main h5py.File interface. There are three parameters that control the "raw data chunk cache" — rdcc_nbytes, rdcc_w0, and rdcc_nslots — which are documented here. The OP was trying to adjust the rdcc_nbytes setting, which can now simply be done as

import h5py

with h5py.File("test.h5", "w", rdcc_nbytes=5242880) as f:
    f.create_dataset(...)

In this case, I assume that you know how much space you actually need, rather than just multiplying by 5 as the OP wanted. The current default values are the same as the OP found. Of course, if you really wanted to do this programatically, you could just open it once, get the cache, close it, and then reopen with the desired parameters.

like image 36
Mike Avatar answered Sep 18 '22 16:09

Mike