hdf5 file to pandas dataframe

Tags:

I downloaded a dataset which is stored in .h5 files. I need to keep only certain columns and to be able to manipulate the data in it.

To do this, I tried to load it in a pandas dataframe. I've tried to use:

pd.read_hdf(path)

But I get: No dataset in HDF5 file.

I've found answers on SO (read HDF5 file to pandas DataFrame with conditions) but I don't need conditions, and the answer adds conditions about how the file was written but I'm not the creator of the file so I can't do anything about that.

I've also tried using h5py:

df = h5py.File(path)

But this is not easily manipulable and I can't seem to get the columns out of it (only the names of the columns using df.keys()) Any idea on how to do this ?

782

asked Nov 07 '16 19:11

Graham Slick

2 Answers

Easiest way to read them into Pandas is to convert into h5py, then np.array, and then into DataFrame. It would look something like:

df = pd.DataFrame(np.array(h5py.File(path)['variable_1']))

193

answered Oct 01 '22 04:10

Ivan Mitevski

Pandas HDF support needs the HDF file to be formated very specifically. You can see https://stackoverflow.com/a/33644128/4128030 for more info.

answered Oct 01 '22 04:10

drj

Related questions
                            
                                How do chained comparisons in Python actually work?
                            
                                Why use re.match(), when re.search() can do the same thing?
                            
                                Get row numbers of rows matching a condition in numpy
                            
                                Python win32gui SetAsForegroundWindow function not working properly
                            
                                How to programmatically count the number of files in an archive using python
                            
                                Data type of pandas column changes to object when it's passed to a function via apply?
                            
                                How to select a list of rows by name in Pandas dataframe
                            
                                How to correctly use auto_created attribute in django?
                            
                                Is there a chain calling method in Python?
                            
                                Python multiprocessing - Why is using functools.partial slower than default arguments?
                            
                                Equivalent to get_contents_to_file in boto3
                            
                                Python Pandas: pivot only certain columns in the DataFrame while keeping others
                            
                                Python send control + Q then control + A (special keys)
                            
                                How to test a Django model with pytest?
                            
                                Adding keys to defaultdict(dict)
                            
                                Using Mypy local stubs
                            
                                Exporting a conda environment with local pip installs
                            
                                How to select column and rows in pandas without column or row names?
                            
                                Importing text file : No Columns to parse from file
                            
                                ImportError: No module named 'botocore.parameters'

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

hdf5 file to pandas dataframe

Tags:

python

pandas

hdf5

Graham Slick

People also ask

2 Answers

Ivan Mitevski

drj

Recent Activity

Donate For Us