good (noSQL?) database for physical measurements [closed]

Tags:

We're building a measurement system that will eventually consist of thousands of measurement stations. Each station will save around 500 million measurements consisting of 30 scalar values over its lifetime. These will be float values. We're now wondering how to save this data on each station, considering we'll be building a web app on each station such that

we want to visualize the data on multiple timescales (eg measurements of one week, month, year)
we need to build moving averages over the data (eg average over a month to show in a year graph)
the database needs to be crash resistant (power outages)
we are only doing writes and reads, no updates or deletes on the data

additionally we'd like one more server that can show the data of, say, 1000 measurement stations. That would be ~50TB of data in 500 billion measurements. To transmit the data from measurement station to server, I thought that some type of database-level replication would be a clean and efficient way.

Now I'm wondering if a noSQL solution might be better than mySQL for these purposes. Especially couchDB, Cassandra and maybe key-value stores like Redis look appealing to me. Which of those would suit the "measurement time series" data model best in your opinion? What about other advantages like crash-safety and replication from measurement station to main server?

987

asked Nov 02 '11 11:11

Chris

1 Answers

I think CouchDB is a great database -- but it's ability to deal with large data is questionable. CouchDB's primary focus is on simplicity of development and offline replication, not necessarily on performance or scalability. CouchDB itself does not support partitioning, so you'll be limited by the maximum node size unless you use BigCouch or invent your own partitioning scheme.

No foolin, Redis is an in-memory database. It's extremely fast and efficient at getting data in and out of RAM. It does have the ability to use disk for storage, but it's not terribly good at it. It's great for bounded quantities of data that change frequently. Redis does have replication, but does not have any built-in support for partitioning, so again, you'll be on your own here.

You also mentioned Cassandra, which I think is more on target for your use case. Cassandra is well suited for databases that grow indefinitely, essentially it's original use case. The partitioning and availability is baked in so you won't have to worry about it very much. The data model is also a bit more flexible than the average key/value store, adding a second dimension of columns, and can practically accomodate millions of columns per row. This allows time-series data to be "bucketed" into rows that cover time ranges, for example. The distribution of data across the cluster (partitioning) is done at the row level, so only one node is necessary to perform operations within a row.

Hadoop plugs right into Cassandra, with "native drivers" for MapReduce, Pig, and Hive, so it could potentially be used to aggregate the collected data and materialize the running averages. The best practice is to shape data around queries, so probably want to store multiple copies of the data in "denormalized" form, one for each type of query.

Check out this post on doing time-series in Cassandra:

http://rubyscale.com/2011/basic-time-series-with-cassandra/

165

answered Sep 28 '22 07:09

rbranson

Related questions
                            
                                Which DB should I use?
                            
                                Storing & Querying Heirarchical Data with Multiple Parent Nodes
                            
                                Implementation of "Most Viewed" feature database
                            
                                Replicating / Cloning data from one MS SQL Server to another
                            
                                Sync large local DB with server DB (MySQL)
                            
                                How to publish an application update on android-market without deleting local database and local files?
                            
                                Scrollable ResultSet JDBC Postgresql
                            
                                How does Facebook handle Privacy settings in database side?
                            
                                Does Ruby on Rails have a way to import an existing database schema?
                            
                                What would you log in a write-ahead log?
                            
                                What types of Databases are available in Android?
                            
                                Best way to access a remote database: via webservice or direct DB-access?
                            
                                How to put all countries/cities/states of the world into my database?
                            
                                Selective get in cassandra faster than normal get?
                            
                                Using memcached as a database buffer for chat messages
                            
                                InnoDB disk usage overhead
                            
                                Is multiple field index in MySQL a good choice?
                            
                                inserting data into multiple tables using a web form
                            
                                Wordpress Taxonomy - how does it know which object_id?
                            
                                Comparing two tables of local database using C#

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

good (noSQL?) database for physical measurements [closed]

Tags:

database

nosql

couchdb

replication

key-value-store

Chris

People also ask

1 Answers

rbranson

Recent Activity

Donate For Us