I'm going to try and make this as straight-forward as I can.
Coming from MySQL and thinking in terms of tables, let's use the following example:
Let's say that we have a real-estate website and we're displaying a list of houses
normally, I'd use the following tables:
I understand that MongoDB gives you the flexibility to design your web-app in different collections with unique IDs much like a relational database (normalized), and to enjoy quick selections, you can nest within a collection, related objects and data (un-normalized).
Back to our real-estate houses list, the query used to populate it is quite expensive in a normal relational DB, for each house you need to query its images, reviews, owner & agencies, each entity resides in a different table with its fields, you'd probably use joins and have multiple queries joined into one - Expensive!
Enter MongoDB - where you don't need joins, and you can store all the related data of a house in a house item on the houses collection, selection was never faster, it's a db heaven!
But what happens when you need to add/update/delete related reviews/agencies/owner/images?
This is a mystery to me, and if I need to guess, each related collection exist on its own collection on top of its data within the houses table, and once one of these pieces of related data is being added/updated/deleted you'll have to update it on its own collection as well as on the houses collection. Upon this update - do I need to query the other collections as well to make sure I'm updating the house record with all the updated related data?
I'm just guessing here and would really appreciate your feedback.
Thanks,
Ajar
1 Answer. Show activity on this post. Normalizing your data like you would with a relational database is usually not a good idea in MongoDB. Normalization in relational databases is only feasible under the premise that JOINs between tables are relatively cheap.
It is important that a database is normalized to minimize redundancy (duplicate data) and to ensure only related data is stored in each table. It also prevents any issues stemming from database modifications such as insertions, deletions, and updates. The stages of organization are called normal forms.
A poorly normalized database and poorly normalized tables can cause problems ranging from excessive disk I/O and subsequent poor system performance to inaccurate data. An improperly normalized condition can result in extensive data redundancy, which puts a burden on all programs that modify the data.
Even some NoSQL databases like Cassandra encourage a very normalized approach to storing data. Normalization typically entails creating a series of tables, each of which can have a different set of fields, but where each record in a given table must have a value for all of its fields — no more, no less.
I've read many articles and grasped a part of it . Normalization, which is increasing the complexity of the schema by splitting tables into multiple smaller ones to reduce the data redundancy ( 1NF, 2NF, 3NF) . But Mongo follows the exact opposite way of what we do with SQL.
The data is stored in a JSON-like format and can contain different kinds of structures. For example, in the same collection we can have the next two documents: To get the best out of MongoDB, you have to understand and follow some basic database design principles.
Normalization - normalizing means storing data into multiple collections with references between them. The data is defined once, making the writing tasks (update) easy. When it comes to reading tasks, normalization has its downsides. If you want to receive data from multiple collections, you have to perform multiple queries making the reads slower.
You have to write your own joins. You have to model your indexes very carefully. You also have to be very aware of "write safety". MongoDB has highly configurable levels of "how much do I care about this write". This needs to be part of your code & configuration and it typically needs to be customized correctly or performance will suffer.
Try this approach:
Work out which entity (or entities) are the hero(s)
With 'hero', I mean the entity(s) that the database is centered around. Let's take your example. The hero of the real-estate example is the house*.
Work out the ownerships
Go through the other entities, such as the owner, agency, images and reviews and ask yourself whether it makes sense to place their information together with the house. Would you have a cascading delete on any of the foreign keys in your relational database? If so, then that implies ownership.
Work out whether it actually matters that data is de-normalised
You will have agency (and probably owner) details spread across multiple houses. Does that matter?
Your house collection will probably look like this:
house: {
owner,
agency,
images[], // recommend references to GridFS here
reviews[] // you probably won't get too many of these for a single house
}
*Actually, it's probably the ad of the house (since houses are typically advertised on a real-estate website and that's probably what you're really interested in) so just consider that
Sarah Mei wrote an informative article about the kinds of issues that can arise with data integrity in nosql dbs. The choice between duplicate data or using id's, code based joins and the challenges with keeping data integrity. Her take is that any nosql db with code based joins will lose data integrity at some point. Imho the articles comments are as valuable as the article itself in understanding these issues and possible resolutions.
Link: http://www.sarahmei.com/blog/2013/11/11/why-you-should-never-use-mongodb/comment-page-1/
I would just like to give a normalization refresher from the MongoDB
's perspective -
MongoDB
, it looks like embedding data would mostly cause this. And in fact, we should try to avoid embedding data in documents in MongoDB
which possibly create these anomalies. Occasionally, we might need to duplicate data in the documents for performance reasons. However that's not the default approach. The default is to avoid it.MongoDB
is flexible enough because it allows addition of keys without re-designing all the documentsMongoDB
. And one of the ideas behind the MongoDB
is to tune up your database to the applications that we're trying to write and the problem we're trying to solve.If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With