Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

When to replace RDBMS/ORM with NoSQL [closed]

Tags:

What kind of projects benefit from using a NoSQL database instead of rdbms wrapped by an ORM?

Examples:

  • Stackoverflow similiar sites?
  • Social communities?
  • forums?
like image 826
jgauffin Avatar asked Aug 19 '10 13:08

jgauffin


People also ask

When should I change to a NoSQL database?

A NoSQL approach is often preferred for: ✓ Real-time data collection. ✓ Big data storage. NoSQL database structures include key-value pairs (simple and fast), documents (store info as XML/JSON), columns (good for queries), and graphs (good for networks), among others.

When would you use NoSQL vs relational database?

Relational databases are table-based. NoSQL databases can be document based, graph databases, key-value pairs, or wide-column stores. Relational databases were built during a time when data was mostly structured and clearly defined by its relationships. Today, we know that data is much more complex.

Will NoSQL replace relational DBMS?

As we move into the future, the volume of unstructured data is going to grow, so NoSQL as a DBMS also has a bright future in the storage industry. But, it will not replace RDBMS, as the relational use cases are well managed by the relational models only.

Does NoSQL need ORM?

When you use NoSQL databases for your infrastructure data tier, you typically do not use an ORM like Entity Framework Core. Instead you use the API provided by the NoSQL engine, such as Azure Cosmos DB, MongoDB, Cassandra, RavenDB, CouchDB, or Azure Storage Tables.


1 Answers

Your question is very general. NoSQL describes a collection of database techniques that are very different from each other. Roughly, there are:

  • Key-value stores (Redis, Riak)
  • Triplestores (AllegroGraph)
  • Column-family stores (Bigtable, Cassandra)
  • Document-oriented stores (CouchDB, MongoDB)
  • Graph databases (Neo4j)

A project can benefit from the use of a document database during the development phase of the project, because you won't have to design complex entity-relation diagrams or write complex join queries. I've detailed other uses of document databases in this answer.

If your application needs to handle very large amounts of data, the development phase will likely be longer when you use a specialized NoSQL solution such as Cassandra. However, when your application goes into production, it will greatly benefit from the performance and scalability of Cassandra.

Very generally speaking, if an application has the following requirements:

  • scale horizontally
  • work with data model X
  • perform Y operations

the application will benefit from using a NoSQL solution that is geared towards storing data model X and perform Y operations on the data. If you need more specific answers regarding a certain type of NoSQL database, you'll need to update your question.

  1. Benefits during development (e.g. easier to use than SQL, no licensing costs)?
  2. Benefits in terms of performance (e.g. runs like hell with a million concurrent users)?
  3. What type of NoSQL database?

Update

Key-value stores can only be queried by key in most cases. They're useful to store simple data, such as user sessions, simple profile data or precomputed values and output. Although it is possible to store more complex data in key-value pairs, it burdens the application with the responsibility of maintaining 'manual' indexes in order to perform more advanced queries.

Triplestores are for storing Resource Description Metadata. I don't anything about these stores, except for what Wikipedia tells me, so you'll have to do some research on that.

Column-family stores are built for storing and processing very large amounts of data. They are used by Google's search engine and Facebook's inbox search. The data is queried by MapReduce functions. Although MapReduce functions may be hard to grasp in the beginning, the concept is quite simple. Here's an analogy which (hopefully) explains the concept:

Imagine you have multiple shoe-boxes filled with receipts, and you want to calculate your total expenses. You invite some of your friends over and assign a person to each shoe-box. Each person writes down the total of each receipt in his shoe-box. This process of selecting the required data is the Map part.

When a person has written down the totals of (some of) his receipts, he can sum up these totals. This is the Reduce part and can be repeated multiple times until all receipts have been handled. In the end, all of your friends come together and sum up their total sums, giving you your total expenses. That's the final Reduce step.

The advantage of this approach is that you can have any number of shoe-boxes and you can assign any number of people to a shoe-box and still end up with the same result. Each shoe-box can be seen as a server in the database's network. Each friend can be seem as a thread on the server. With MapReduce you can have your data distributed across many servers and have each server handle part of the query, optimizing the performance of your database.

Document-oriented stores are explained in this question, so I won't discuss them here.

Graph databases are for storing networks of highly connected objects, like the users on a social network for example. These databases are optimized for graph operations, such as finding the shortest path between two nodes, or finding all nodes within three hops from the current node. Such operations are quite expensive on RDBMS systems or other NoSQL databases, but very cheap on graph databases.

like image 92
Niels van der Rest Avatar answered Oct 12 '22 23:10

Niels van der Rest