I have recently been researching NoSql options. My scenario is as follows:
We collect and store data from custom hardware at remote locations around the world. We record data from every site every 15 minutes. We would eventually like to move to every 1 minute. Each record has between 20 and 200 measurements. Once set up the hardware records and reports the same measurements every time.
The biggest issue we are facing is that we get a different set of measurements from every project. We measure about 50-100 different measurement types, however any project can have any number of each type of measurement. There is no preset set of columns that can accommodate the data. Because of this we create and build each projects data table with the exact columns it needs as we set up and configure the project on the system.
We provide tools to help analyze the data. This typically includes more calculations and data aggregation, some of which we also store.
We are currently using a mysql database with a table for each client. There are no relations between tables.
NoSql seems promising because we could store a project_id, timestamp then the rest would not be preset. This means one table, more relationships in the data, yet still handling the variety of measurements.
Is a 'NoSql' solution right for this job? If so which ones?
I have been investigation MongoDB and it seems promising...
Example for Clarification:
Project 1 has 5 data points recorded, the mysql table columns look like: timestamp, temp, wind speed, precipitation, irradiance, wind direction
Project 2 has 3 data points recorded mysql table columns: timestamp, temp, irradiance, temp2
The simple answer is that there is no simple answer to these sort of problems, the only way to find out what works for your scenario is to invest R&D time into it.
The question is hard to answer because the performance requirements aren't spelled out by the OP. It appears to be 75M/year records over a number of customers with a write rate of num_customers*1minute (which is low), but I don't have figures for the required read / query performance.
Effectively you have already a sharded database using horizontal partitioning because you're storing each customer in a seperate table. This is good and will increase performance. However you haven't yet established that you have a performance problem, so this needs to be measured and the problem size assessed before you can fix it.
A NoSQL database is indeed a good way of fixing performance problems with traditional RDBMS, but it will not provide automatic scalabity and is not a general solution. You need to find your performance problem fix and then design the (nosqL) data model to provide the solution.
Depending on what you're trying to achieve I'd look at MongoDB, Apache Cassandra, Apache HBase or Hibari.
Remember that NoSQL is a vague term typically encompassing
So, in the first instance I'd see if a traditional RDBMS can achieve the required performance, using all available techniques, get a copy of High Performance MySQL and read MySQL Performance Blog.
In light of your comments I think it is fair to say that you could achieve what you want with one of the above NOSQL engines.
My primary recommendation would be to get your data model designed and implemented, what you're using at the moment isn't really right.
So look at Entity-attribute-value model as I think it is exactly right for what you need.
You need to get your data model right before you can consider which technology to use, being honest modifying schemas dynamically isn't a datamodel.
I'd use a traditional SQL database to validate and test the new datamodel as the management tools are better and it's generally easier to work with the schemas as you refine the datamodel.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With