Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

H5py store list of list of strings

Tags:

python

hdf5

h5py

Is there a possibility in h5py to create a dataset which consists of lists of strings. I tried to create a nested datatype of variable length, but this results in segmentation fault in my python interpreter.

def create_dataset(h5py_file):
    data = [['I', 'am', 'a', 'sentecne'], ['another', 'sentence']]
    string_dt = h5py.special_dtype(vlen=str)
    nested_dt = h5py.special_dtype(vlen=string_dt)
    h5py_file.create_dataset("sentences", data=data, dtype = nested_dt)
like image 234
PKuhn Avatar asked Jun 17 '16 04:06

PKuhn


People also ask

Can HDF5 store strings?

Storing strings You can use string_dtype() to explicitly specify any HDF5 string datatype.

How do I read a HDF5 file in Python?

To use HDF5, numpy needs to be imported. One important feature is that it can attach metaset to every data in the file thus provides powerful searching and accessing. Let's get started with installing HDF5 to the computer. As HDF5 works on numpy, we would need numpy installed in our machine too.

How do I open a H5 file?

Open a HDF5/H5 file in HDFView hdf5 file on your computer. Open this file in HDFView. If you click on the name of the HDF5 file in the left hand window of HDFView, you can view metadata for the file. This will be located in the bottom window of the application.

What is key in HDF5?

Groups are the container mechanism by which HDF5 files are organized. From a Python perspective, they operate somewhat like dictionaries. In this case the “keys” are the names of group members, and the “values” are the members themselves ( Group and Dataset ) objects.


2 Answers

If you don't intend to edit the hdf5 file (and potentially use longer strings), you can also simply use:

h5py_file.create_dataset("sentences", data=np.array(data, dtype='S'))
like image 128
jan-glx Avatar answered Sep 20 '22 02:09

jan-glx


You should be able to get the functionality you want if you define your data as a numpy array of dtype=object as suggested in this post, rather than a list of lists.

def create_dataset(h5py_file):
    data = np.array([['I', 'am', 'a', 'sentence'], ['another', 'sentence']], dtype=object)
    string_dt = h5py.special_dtype(vlen=str)
    h5py_file.create_dataset("sentences", data=data, dtype=string_dt)
like image 33
Heather QC Avatar answered Sep 20 '22 02:09

Heather QC