Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to store dictionary in HDF5 dataset

Tags:

python

h5py

I have a dictionary, where key is datetime object and value is tuple of integers:

>>> d.items()[0] (datetime.datetime(2012, 4, 5, 23, 30), (14, 1014, 6, 3, 0)) 

I want to store it in HDF5 dataset, but if I try to just dump the dictionary h5py raises error:

TypeError: Object dtype dtype('object') has no native HDF5 equivalent

What would be "the best" way to transform this dictionary so that I can store it in HDF5 dataset?

Specifically I don't want to just dump the dictionary in numpy array, as it would complicate data retrieval based on datetime query.

like image 470
theta Avatar asked May 11 '13 07:05

theta


People also ask

Can HDF5 store strings?

Storing stringsYou can use string_dtype() to explicitly specify any HDF5 string datatype.

Why is HDF5 file so large?

This is probably due to your chunk layout - the more chunk sizes are small the more your HDF5 file will be bloated. Try to find an optimal balance between chunk sizes (to solve your use-case properly) and the overhead (size-wise) that they introduce in the HDF5 file.


2 Answers

I found two ways to this:

I) transform datetime object to string and use it as dataset name

h = h5py.File('myfile.hdf5') for k, v in d.items():     h.create_dataset(k.strftime('%Y-%m-%dT%H:%M:%SZ'), data=np.array(v, dtype=np.int8)) 

where data can be accessed by quering key strings (datasets name). For example:

for ds in h.keys():     if '2012-04' in ds:         print(h[ds].value) 

II) transform datetime object to dataset subgroups

h = h5py.File('myfile.hdf5') for k, v in d.items():     h.create_dataset(k.strftime('%Y/%m/%d/%H:%M'), data=np.array(v, dtype=np.int8)) 

notice forward slashes in strftime string, which will create appropriate subgroups in HDF file. Data can be accessed directly like h['2012']['04']['05']['23:30'].value, or by iterating with provided h5py iterators, or even by using custom functions through visititems()

For simplicity I choose the first option.

like image 124
theta Avatar answered Sep 20 '22 15:09

theta


This question relates to the more general question of being able to store any type of dictionary in HDF5 format. First, convert the dictionary to a string. Then to recover the dictionary, use the ast library by using the import ast command. The following code gives an example.

>>> d = {1:"a",2:"b"} >>> s = str(d) >>> s "{1: 'a', 2: 'b'}" >>> ast.literal_eval(s) {1: 'a', 2: 'b'} >>> type(ast.literal_eval(s)) <type 'dict'> 
like image 29
Ameet Deshpande Avatar answered Sep 19 '22 15:09

Ameet Deshpande