Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is h5py capable of converting python dictionaries to hdf5 groups automatically?

Tags:

hdf5

h5py

I have been using scipy.io to save my structured data (lists and dictionaries filled with ndarrays in different shapes). Since v7.3 mat file is going to replace the old v7 mat format some day, I am thinking about switching to HDF5 to store my data, more specifically h5py for python. However, I noticed that I cannot save my dictionaries as easy as:

import scipy.io as sio
data = {'data': 'Complicated structure data'}
sio.savemat('fileName.mat', data)

Instead, I have to use h5py.create_group one by one to replicated the structure in python dictionary. For very large structures, this is unfeasible. Is there an easy way to automatically convert python dictionaries to hdf5 groups?

Thank you!

-Shawn

like image 317
Yuxiang Wang Avatar asked Jul 31 '13 17:07

Yuxiang Wang


People also ask

What is h5py used for?

The h5py package is a Pythonic interface to the HDF5 binary data format. It lets you store huge amounts of numerical data, and easily manipulate that data from NumPy. For example, you can slice into multi-terabyte datasets stored on disk, as if they were real NumPy arrays.

What is the difference between HDF5 group and HDF5 dataset?

Within one HDF5 file, you can store a similar set of data organized in the same way that you might organize files and folders on your computer. However in a HDF5 file, what we call "directories" or "folders" on our computers, are called groups and what we call files on our computer are called datasets .

How do I create an HDF5 file in Python?

Creating HDF5 files The first step to creating a HDF5 file is to initialise it. It uses a very similar syntax to initialising a typical text file in numpy. The first argument provides the filename and location, the second the mode. We're writing the file, so we provide a w for write access.


1 Answers

I needed to do this kind of thing all the time, and decided it would be neat to make a hdf5 version of pickle: https://github.com/telegraphic/hickle

The motivation was storing python dictionaries of numpy arrays, which sounds like what you're after:

import hickle as hkl
import numpy as np
data = {
        'dataset1' : np.zeros((100,100)),
        'dataset2' : np.random.random((100,100))
        }
hkl.dump(data, 'output_filename.hkl')

You should be able to install it via PyPi (pip install hickle), or download it from github.

Cheers Danny

like image 75
telegraphic Avatar answered Nov 02 '22 23:11

telegraphic