Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Difference between HDF5 file and PyTables file

Is there a difference between HDF5 files and files created by PyTables?

PyTables has two functions .isHDFfile() and .isPyTablesFile() suggesting that there is a difference between the two formats.

I've done some looking around on Google and have gathered that PyTables is built on top of HDF, but I wasn't able to find much beyond that.

I am specifically interested in interoperability, speed and overhead.

Thanks.

like image 955
dtlussier Avatar asked Nov 03 '11 22:11

dtlussier


People also ask

What is an HDF5 file?

The Hierarchical Data Format version 5 (HDF5), is an open source file format that supports large, complex, heterogeneous data. HDF5 uses a "file directory" like structure that allows you to organize data within the file in many different structured ways, as you might do with files on your computer.

What is HDF5 file in Python?

HDF5 file stands for Hierarchical Data Format 5. It is an open-source file which comes in handy to store large amount of data. As the name suggests, it stores data in a hierarchical structure within a single file.

Why is HDF5 file so large?

This is probably due to your chunk layout - the more chunk sizes are small the more your HDF5 file will be bloated. Try to find an optimal balance between chunk sizes (to solve your use-case properly) and the overhead (size-wise) that they introduce in the HDF5 file.

What is PyTables package?

PyTables is a package for managing hierarchical datasets and designed to efficiently and easily cope with extremely large amounts of data. You can download PyTables and use it for free. You can access documentation, some examples of use and presentations here.


1 Answers

PyTables files are HDF5 files.

However, as I understand it, PyTables adds some extra metadata to the attributes of each entry in the HDF file.

If you're looking for a more "vanilla" hdf5 solution for python/numpy, have a look a h5py.

It's less database-like (i.e. less "table-like") than PyTables, and doesn't have as many nifty querying features, but it's much more straight-forward, in my opinion. If you're going to be accessing an hdf5 file from multiple different languages, h5py is probably a better route to take.

like image 104
Joe Kington Avatar answered Sep 16 '22 22:09

Joe Kington