Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is hdf5 suitable for real-time measurements

I would like to know if hdf5 is suitable for real-time data logging or not ?

More precisely: I work on a project in which we want to continuously (sampling rate ranging form 30 to 400Hz) mix a fair amount of data (several hours) of different natures (telemetry, signals, videos).

Data have to be written in real-time (or with a small delay) in order to keep us from losing them on potential crash.

Our first prototype is based on sqlite3, however we feel that some limitations could rise from a long run usage: speed, one database == one file, and difficulties for accessing database from several threads (Lock exception when reading and writing at the same time).

So, I am considering the possibility to use hdf5 as a back-end for data storage on disk (and numpy/pytable for internal representation). Do you think it is possible to update hdf5 file on a regular basis from such python binding ?

like image 660
Cheatboy2 Avatar asked Jul 13 '12 11:07

Cheatboy2


1 Answers

HDF5 packet tables ARE suitable for real time measurements - however you are better off using fixed size packets of data to a regular old posix file and converting later. This is because HDF5 is not very robust at the moment and does not provide the various guarantees using low level file IO code has - said low level code is actually pretty easy to work with. At some point though when the data you are working with get's complex enough, HDF5 should come in but be aware relative to low level file IO, it is heavyweight and cannot be multithreaded with reasonable determinism/performance because of its global mutex usage. In addition, if the system crashes for instance, the resulting HDF5 file is garbage/unrecoverable - this will be fixed one day but requires funding for the HDF group to expedite and get it done in the next decade.

My own policy is to use packet log files whenever possible. Then immediately convert the result to HDF5 after these files are recorded for long term use + compression + use by other tools/programs. Said recorders often I craft to dump an HDF5 file explaining the binary structure at time of writing so I can simply read that file later to understand what the structs are in the packet log files and hand that off to a real HDF file after loading the packets up in memory.

With all that said and done, have a look at the packet table api from boeing. It also has a black sheep c++ binding in the hl c++ library that comes with hdf5, although I've had to patch it for my uses.

like image 117
Jason Newton Avatar answered Oct 18 '22 12:10

Jason Newton