Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

NoSQL vs. Relational Databases vs. Possible Hybrid

I'm hearing more about NoSQL, but have yet had someone give me a clear explanation of how it is to be used instead of relational databases.

I've read that it can't do left joins, so I was trying to figure out how you'd be able to use such a data storage. From reading: Preserve Joins by code in MongoDB it seems like a suggestion is to just make a large table, as if you already did the joins on it.

If the above statement is true, then I can see how it can be used. However I'm curious on how you'd handle repeat data. As the concept of normalizing, helps you remove the redundancy and ensure consistency in the data (e.g. Slight modifications like capitalization, white space, etc)...

Are we simply sacrificing the consistency of the data for scalable speed, or am I missing something?

Edit

I've been doing some more digging and found the answers the following questions useful for clarifying my understanding:

  • Why Google's BigTable referred as a NoSQL database?
  • How do you track record relations in NoSQL?

My understanding of consistency seems to be correct from those answers. And it looks like NoSQL is suppose to be used for specific problems types and that if you need relations that you should use a relational database.

But this raises more questions like:

  1. It makes me wonder about real life examples of when to use NoSQL versus when not to?
  2. By denormalizing the data, you should be able to solve all of the same problems that relational databases do... But there are rules on how to normalize data with relational databases. Are there rules that one can use to help them denormalize the data to use a NoSQL solution?
  3. Any examples on when you might want to consider using both a NoSQL solution in parallel with a relational database?
like image 476
James Oravec Avatar asked Oct 20 '13 20:10

James Oravec


1 Answers

MongoDB has the ability to have documents which include arrays of other documents. This solves many cases where you would have relations in reational databases.

When an invoice has multiple positions, you wouldn't put these positions into a separate collection. You would embed them as an array.

It makes me wonder about real life examples of when to use NoSQL versus when not to?

There are many different NoSQL databases, each one designed with different use-cases in mind. But you tagged this question as MongoDB, so I assume that you mean MongoDB in particular.

MongoDB has two main advantages over relational databases.

First, it scales well.

When the database is too slow or too big, you can easily add more servers by creating a cluster or replica-set of multiple shards. This doesn't work nearly as well with most relational databases.

Second, it allows heterogeneous data.

Imagine, for example, the product database of a computer hardware store. What properties do products have? All products have a price and a vendor. But CPUs have a clock rate, hard drives and RAM chips have a capacity (and these capacities aren't comparable), monitors have a resolution and so on. How would you design this in a relational database? You would either create a very long productID-property-value table or you would create a very wide and sparse product table with every property you can imagine, but most of them being NULL for most products. Both solutions aren't really elegant. But MongoDB can solve this much better because it allows each document in a collection to have a different set of properties.

What can't it do?

As a rather new technology, there isn't that much literature about it. The software ecosystem around it isn't that well either. The tools you can get for relational databases are often much more shiny.

There are also some use-cases MongoDB isn't well-suited for.

  • MongoDB doesn't do JOINs. When your data is very relational and denormalizing it would be counter-productive, it might be a poor choice for your product. But you might want to take a look at graph databases like Neo4j, which focus even more on relations than relational databases. Update 2016: MongoDB 3.2 now has rudimentary JOIN support with the $lookup aggregation stage, but it's still very limited in functionality compared to relational and graph databases.
  • MongoDB doesn't do transactions. At least not complex transactions. Certain actions which only affect a single document are guaranteed to be atomic, but as soon as you affect more than one document, you can't guarantee that no other query will happen in-between and find an inconsistent state.
  • MongoDB is bad for ad-hoc reporting. Its options for data-mining are severely limited. The rather new aggregation functions help and MapReduce can also solve some surprisingly complex problems when you learn to use it smart, but SQL has usually the better tools for things like that.

By denormalizing the data, you should be able to solve all of the same problems that relational databases do... But there are rules on how to normalize data with relational databases. Are there rules that one can use to help them denormalize the data to use a NoSQL solution?

Relational databases are around for about 40 years. Their theory is a well-researched topic in computer science. There are whole libraries of books written about the theory behind them. There is a by-the-book solution for every imaginable corner-case by now.

But NoSQL databases, on the other hand, are a rather new technology. We are still figuring out the best practices. The most frequent advise is: "Use your own head. Think about what queries are performed most often, and optimize your data schema for them."

Any examples on when you might want to consider using both a NoSQL solution in parallel with a relational database?

When possible I would advise against using two different database technologies in the same product:

  • Anyone who maintains and supports the product must be familiar with both technologies
  • Troubleshooting gets a lot harder
  • The sysadmins need to keep an additional database running and updated
  • You have an additional point of failure which can lead to downtime

I would only recommend to mix database technologies when fulfilling your requirements without it doesn't just become hard but physically impossible. Otherwise, make your pick and stay with it.

like image 121
Philipp Avatar answered Nov 03 '22 01:11

Philipp