Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is a good storage candidate for soft-realtime data acquisition under Linux?

I'm building a system for data acquisition. Acquired data typically consists of 15 signals, each sampled at (say) 500 Hz. That is, each second approx 15 x 500 x 4 bytes (signed float) will arrive and have to persisted.

The previous version was built on .NET (C#) using a DB4O db for data storage. This was fairly efficient and performed well.

The new version will be Linux-based, using Python (or maybe Erlang) and ... Yes! What is a suitable storage-candidate?

I'm thinking MongoDB, storing each sample (or actually a bunch of them) as BSON objects. Each sample (block) will have a sample counter as a key (indexed) field, as well as a signal source identification.

The catch is that I have to be able to retrieve samples pretty quickly. When requested, up to 30 seconds of data have to be retrieved in much less than a second, using a sample counter range and requested signal sources. The current (C#/DB4O) version manages this OK, retrieving data in much less than 100 ms.

I know that Python might not be ideal performance-wise, but we'll see about that later on.

The system ("server") will have multiple acquisition clients connected, so the architecture must scale well.

Edit: After further research I will probably go with HDF5 for sample data and either Couch or Mongo for more document-like information. I'll keep you posted.

Edit: The final solution was based on HDF5 and CouchDB. It performed just fine, implemented in Python, running on a Raspberry Pi.

like image 444
Micke Avatar asked Oct 26 '12 09:10

Micke


1 Answers

you could have a look into using HDF5 ... It is designed for streamed data, allows time-indexed seeking and (as far as I know) is pretty well supported in Python

like image 198
sylvain.joyeux Avatar answered Sep 29 '22 03:09

sylvain.joyeux