Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

MongoDB beginner - to normalize or not to normalize?

I'm going to try and make this as straight-forward as I can.
Coming from MySQL and thinking in terms of tables, let's use the following example:

Let's say that we have a real-estate website and we're displaying a list of houses
normally, I'd use the following tables:

  • houses - the real estate asset at hand
  • owners - the owner of the house (one-to-many relationship with houses)
  • agencies - the real-estate broker agency (many-to-many relationship with houses)
  • images - many-to-one relationship with houses
  • reviews - many-to-one relationship with houses

I understand that MongoDB gives you the flexibility to design your web-app in different collections with unique IDs much like a relational database (normalized), and to enjoy quick selections, you can nest within a collection, related objects and data (un-normalized).

Back to our real-estate houses list, the query used to populate it is quite expensive in a normal relational DB, for each house you need to query its images, reviews, owner & agencies, each entity resides in a different table with its fields, you'd probably use joins and have multiple queries joined into one - Expensive!

Enter MongoDB - where you don't need joins, and you can store all the related data of a house in a house item on the houses collection, selection was never faster, it's a db heaven!
But what happens when you need to add/update/delete related reviews/agencies/owner/images?

This is a mystery to me, and if I need to guess, each related collection exist on its own collection on top of its data within the houses table, and once one of these pieces of related data is being added/updated/deleted you'll have to update it on its own collection as well as on the houses collection. Upon this update - do I need to query the other collections as well to make sure I'm updating the house record with all the updated related data?
I'm just guessing here and would really appreciate your feedback.

Thanks,
Ajar

like image 808
Ajar Avatar asked Jul 13 '13 05:07

Ajar


People also ask

Should I normalize data in MongoDB?

1 Answer. Show activity on this post. Normalizing your data like you would with a relational database is usually not a good idea in MongoDB. Normalization in relational databases is only feasible under the premise that JOINs between tables are relatively cheap.

Should you always normalize database?

It is important that a database is normalized to minimize redundancy (duplicate data) and to ensure only related data is stored in each table. It also prevents any issues stemming from database modifications such as insertions, deletions, and updates. The stages of organization are called normal forms.

What happens if you don't normalize a database?

A poorly normalized database and poorly normalized tables can cause problems ranging from excessive disk I/O and subsequent poor system performance to inaccurate data. An improperly normalized condition can result in extensive data redundancy, which puts a burden on all programs that modify the data.

Should you normalize a NoSQL database?

Even some NoSQL databases like Cassandra encourage a very normalized approach to storing data. Normalization typically entails creating a series of tables, each of which can have a different set of fields, but where each record in a given table must have a value for all of its fields — no more, no less.

What is normalization in MongoDB?

I've read many articles and grasped a part of it . Normalization, which is increasing the complexity of the schema by splitting tables into multiple smaller ones to reduce the data redundancy ( 1NF, 2NF, 3NF) . But Mongo follows the exact opposite way of what we do with SQL.

How is data stored in MongoDB?

The data is stored in a JSON-like format and can contain different kinds of structures. For example, in the same collection we can have the next two documents: To get the best out of MongoDB, you have to understand and follow some basic database design principles.

What is normalization in DBMS?

Normalization - normalizing means storing data into multiple collections with references between them. The data is defined once, making the writing tasks (update) easy. When it comes to reading tasks, normalization has its downsides. If you want to receive data from multiple collections, you have to perform multiple queries making the reads slower.

How to improve performance of MongoDB database?

You have to write your own joins. You have to model your indexes very carefully. You also have to be very aware of "write safety". MongoDB has highly configurable levels of "how much do I care about this write". This needs to be part of your code & configuration and it typically needs to be customized correctly or performance will suffer.


3 Answers

Try this approach:

Work out which entity (or entities) are the hero(s)

With 'hero', I mean the entity(s) that the database is centered around. Let's take your example. The hero of the real-estate example is the house*.

Work out the ownerships

Go through the other entities, such as the owner, agency, images and reviews and ask yourself whether it makes sense to place their information together with the house. Would you have a cascading delete on any of the foreign keys in your relational database? If so, then that implies ownership.

Work out whether it actually matters that data is de-normalised

You will have agency (and probably owner) details spread across multiple houses. Does that matter?

Your house collection will probably look like this:

house: {
owner,
agency,
images[], // recommend references to GridFS here
reviews[] // you probably won't get too many of these for a single house
}

*Actually, it's probably the ad of the house (since houses are typically advertised on a real-estate website and that's probably what you're really interested in) so just consider that

like image 123
martingreber Avatar answered Oct 23 '22 09:10

martingreber


Sarah Mei wrote an informative article about the kinds of issues that can arise with data integrity in nosql dbs. The choice between duplicate data or using id's, code based joins and the challenges with keeping data integrity. Her take is that any nosql db with code based joins will lose data integrity at some point. Imho the articles comments are as valuable as the article itself in understanding these issues and possible resolutions.

Link: http://www.sarahmei.com/blog/2013/11/11/why-you-should-never-use-mongodb/comment-page-1/

like image 37
Agent Zebra Avatar answered Oct 23 '22 11:10

Agent Zebra


I would just like to give a normalization refresher from the MongoDB's perspective -

What are the goals of normalization?

  • Frees the database from modification anomalies - For MongoDB, it looks like embedding data would mostly cause this. And in fact, we should try to avoid embedding data in documents in MongoDB which possibly create these anomalies. Occasionally, we might need to duplicate data in the documents for performance reasons. However that's not the default approach. The default is to avoid it.
  • Should minimize re-design when extending - MongoDB is flexible enough because it allows addition of keys without re-designing all the documents
  • Avoid bias toward any particular access pattern - this is something, we're not going to worry about when describing schema in MongoDB. And one of the ideas behind the MongoDB is to tune up your database to the applications that we're trying to write and the problem we're trying to solve.
like image 29
xameeramir Avatar answered Oct 23 '22 11:10

xameeramir