Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas can't read hdf5 file created with h5py

Tags:

I get pandas error when I try to read HDF5 format files that I have created with h5py. I wonder if I am just doing something wrong?

import h5py import numpy as np import pandas as pd h5_file = h5py.File('test.h5', 'w') h5_file.create_dataset('zeros', data=np.zeros(shape=(3, 5)), dtype='f') h5_file.close() pd_file = pd.read_hdf('test.h5', 'zeros') 

gives an error: TypeError: cannot create a storer if the object is not existing nor a value are passed

I tried to specify key set to '/zeros' (as I would do it with h5py when reading the file) with no luck.

If I use pandas.HDFStore to read it in, I get an empty store back:

store = pd.HDFStore('test.h5') >>> store <class 'pandas.io.pytables.HDFStore'> File path: test.h5 Empty 

I have no troubles reading just created file back with h5py:

h5_back = h5py.File('test.h5', 'r') h5_back['/zeros'] <HDF5 dataset "zeros": shape (3, 5), type "<f4"> 

Using these versions:

Python 3.4.3 (v3.4.3:9b73f1c3e601, Feb 23 2015, 02:52:03)  [GCC 4.2.1 (Apple Inc. build 5666) (dot 3)] on darwin  pd.__version__ '0.16.2' h5py.__version__ '2.5.0' 

Many thanks in advance, Masha

like image 375
Masha L. Avatar asked Nov 10 '15 22:11

Masha L.


People also ask

Can pandas read HDF5?

Pandas uses PyTables for reading and writing HDF5 files, which allows serializing object-dtype data with pickle when using the “fixed” format.

How do I view an HDF5 file?

Open a HDF5/H5 file in HDFView hdf5 file on your computer. Open this file in HDFView. If you click on the name of the HDF5 file in the left hand window of HDFView, you can view metadata for the file. This will be located in the bottom window of the application.

Is h5py thread safe?

Thread safety improvements Access to all APIs, high- and low-level, are now protected by a global lock. The entire API is now believed to be thread-safe. Feedback and real-world testing is welcome.


1 Answers

I've worked a little on the pytables module in pandas.io and from what I know pandas interaction with HDF files is limited to specific structures that pandas understands. To see what these look like, you can try

import pandas as pd import numpy as np pd.Series(np.zeros((3,5),dtype=np.float32).to_hdf('test.h5','test') 

If you open 'test.h5' in HDFView, you will see a path /test with 4 items that are needed to recreate the DataFrame.

HDFView of test.h5

So I think your only option for reading in NumPy arrays is to read them in directly and then convert these to Pandas objects.

like image 143
Kevin S Avatar answered Sep 19 '22 16:09

Kevin S