Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Need Advice: Is this a good use case for a 'NoSQL' Database? If so, which one?

I have recently been researching NoSql options. My scenario is as follows:

We collect and store data from custom hardware at remote locations around the world. We record data from every site every 15 minutes. We would eventually like to move to every 1 minute. Each record has between 20 and 200 measurements. Once set up the hardware records and reports the same measurements every time.

The biggest issue we are facing is that we get a different set of measurements from every project. We measure about 50-100 different measurement types, however any project can have any number of each type of measurement. There is no preset set of columns that can accommodate the data. Because of this we create and build each projects data table with the exact columns it needs as we set up and configure the project on the system.

We provide tools to help analyze the data. This typically includes more calculations and data aggregation, some of which we also store.

We are currently using a mysql database with a table for each client. There are no relations between tables.

NoSql seems promising because we could store a project_id, timestamp then the rest would not be preset. This means one table, more relationships in the data, yet still handling the variety of measurements.

Is a 'NoSql' solution right for this job? If so which ones?

I have been investigation MongoDB and it seems promising...

Example for Clarification:

Project 1 has 5 data points recorded, the mysql table columns look like: timestamp, temp, wind speed, precipitation, irradiance, wind direction

Project 2 has 3 data points recorded mysql table columns: timestamp, temp, irradiance, temp2

like image 230
Alan Peabody Avatar asked Sep 24 '10 17:09

Alan Peabody


1 Answers

The simple answer is that there is no simple answer to these sort of problems, the only way to find out what works for your scenario is to invest R&D time into it.

The question is hard to answer because the performance requirements aren't spelled out by the OP. It appears to be 75M/year records over a number of customers with a write rate of num_customers*1minute (which is low), but I don't have figures for the required read / query performance.

Effectively you have already a sharded database using horizontal partitioning because you're storing each customer in a seperate table. This is good and will increase performance. However you haven't yet established that you have a performance problem, so this needs to be measured and the problem size assessed before you can fix it.

A NoSQL database is indeed a good way of fixing performance problems with traditional RDBMS, but it will not provide automatic scalabity and is not a general solution. You need to find your performance problem fix and then design the (nosqL) data model to provide the solution.

Depending on what you're trying to achieve I'd look at MongoDB, Apache Cassandra, Apache HBase or Hibari.

Remember that NoSQL is a vague term typically encompassing

  • Applications that are either performance intensive in read or write. Often sacrificing read or write performance at the expense of the other.
  • Distribution and scalability
  • Different methods of persistency (RAM/Disk)
  • A more structured/defined access pattern making ad-hoc queries harder.

So, in the first instance I'd see if a traditional RDBMS can achieve the required performance, using all available techniques, get a copy of High Performance MySQL and read MySQL Performance Blog.

Rev1:

In light of your comments I think it is fair to say that you could achieve what you want with one of the above NOSQL engines.

My primary recommendation would be to get your data model designed and implemented, what you're using at the moment isn't really right.

So look at Entity-attribute-value model as I think it is exactly right for what you need.

You need to get your data model right before you can consider which technology to use, being honest modifying schemas dynamically isn't a datamodel.

I'd use a traditional SQL database to validate and test the new datamodel as the management tools are better and it's generally easier to work with the schemas as you refine the datamodel.

like image 173
Richard Harrison Avatar answered Nov 15 '22 10:11

Richard Harrison