So there's this new cool thing, these NoSQL-databases. And so there's my data: Rows of rows of rows of meteorological data: Values, representing certain measurements at a certain station (Identified by a WMO number, not coordinates), at a certain time.
Not every station measures every parameter, not every parameter is measured all the time.
I store this data (30 years worth of hourly values, resulting in ~1 billion values) currently in MySQL. The continous growth and the forseeable addition of even more data give me a little headache.
Reading about the document based NoSQL systems which seem to scale rather easily, I was wondering if NoSQL is a viable data storage concept for meteorological data too. Do you have any experience with this?
Update: Forgot about typical queries: Most of the queries need data in the temporal axis: I.e. give me the temperatures of station 066310 from 01.01.2010 00:00 to 01.03.2010 00:00.
Or: give me the most recent values of all parameters of a particular station.
NoSQL could be a fit when your data structure is quite simple (for example a simple key-value store) / predictable and you have no need for relational integrity or a need for ad-hoc and/or advanced querying.
What you win in easy scalability you might lose in flexibility and consistency though.
The biggest problem would be to have an easy means for composing complex queries over your data. I would say meterological data is not the best candidate for NoSQL.
I personally prefer PostgreSQL over MySQL and find it very scalable (even with millions or even billions of rows) when setup correctly.
I think you should try with a full-featured and mature DBMS, before giving up with SQL.
See for instance:
http://www.yafla.com/dforbes/Getting_Real_about_NoSQL_and_the_SQL_Performance_Lie/
http://www.yafla.com/dforbes/The_Impact_of_SSDs_on_Database_Performance_and_the_Performance_Paradox_of_Data_Explodification/
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With