This question is for all NoSQL and specially mongoDB experts out there. I started by designing a relational DB for a project but client wants us to use a DB that can easily scale. To achieve this we have decided to use mongoDB. These days I am having trouble mapping my relational model for NoSQL. I have a users table which has a many-to-many relation with a lot of other tables as illustrated below:
I have a few options when converting it for mongoDB:
Option 1 (with complete rows in users):
users:{
_id:<user_id>,
battles:{[battle1, battle2, ...]},
items:{[item1, item2, ...]},
locations:{[location1, location2, ...]},
units:{[unit1, unit2, ...]},
}
battles:{
<battle_info>
}
locations:{
<location_info>
}
units:{
<units_info>
}
items:{
<items_info>
}
Option2 (with only foreign keys in users):
users:{
_id:<user_id>,
battles:{[battle1_id, battle2_id, ...]},
items:{[item1_id, item2_id, ...]},
locations:{[location1_id, location2_id, ...]},
units:{[unit1_id, unit2_id, ...]},
}
battles:{
<battle_info>
}
locations:{
<location_info>
}
units:{
<units_info>
}
items:{
<items_info>
}
Option 3 (user ids in other tables):
users:{
_id:<user_id>,
}
battles:{
<battle_info>,
user:{[user1_id, user2_id, ...]}
}
locations:{
<location_info>,
user:{[user1_id, user2_id, ...]}
}
units:{
<units_info>,
user:{[user1_id, user2_id, ...]}
}
items:{
<items_info>,
user:{[user1_id, user2_id, ...]}
}
Option 1 has a lot of duplication as we are adding complete rows of other tables. One issue I see in this is that if a certain item or battle is updated, we will have to find all occurrences of it in users table and update them as well. But this gives us the advantage of always having a complete users object which can be handed to client application at time of login.
Option 2 is more relational where we only have mongoIds of other tables in users table. Advantage of this options is that updating a battle or item doesn't have much cost as rows are referenced not copied. On the other hand, when user logs in we will have to find all referenced units, battles, items, and locations to respond with a complete users object.
Option 3 is opposite to option 2 where mongoIds of users table are kept in other tables. This option doesn't appeal much to me.
I would really appreciate of someone can guide me or come up with a better model.
Edit:
Basically this is an mmorpg game where multiple clients apps will connect to server through webservices. We have got a local db at client to store data. I want a model through which server can respond with a complete user object and then update or insert data changed on client apps.
First, NoSQL is not one size fits all. In SQL, almost every 1:N and M:N relation is modeled in the same way. The NoSQL philosophy is that the way you model the data depends on the data and its use patterns.
Second, I agree with Mark Baker: Scaling is hard, and it's achieved by loosening constraints. It's not a technology matter. I love working with MongoDB, but for other reasons (no need to code ugly SQL; no need for complicated, bloated ORM; etc.)
Now let's review your options: Option 1 copies more data than needed. You will often have to denormalize some data, but never all of it. If so, it's cheaper to fetch the referenced object.
Option 2/3 they are very similar. The key here is: who's writing? You don't want a lot of clients having write-access to the same document, because that will force you to use a locking mechanism, and/or restrict yourself to modifier operations only. Therefore, option 2 is probably better than 3. However, if A attacks B, they'd also trigger a write to user B, so you have to make sure your writes are safe.
Option 4 Partial denormalization: Your user object seems to be most important, so how about this:
user {
battles : [ {"Name" : "The battle of foo", "Id" : 4354 }, ... ]
...
}
This will make it easier to show e.g. a user dashboard, because you don't need to know all the details in the dashboard. Note: the data structure is then coupled to details of the presentation.
Option 5 Data on edges. Often, the relation needs to hold data as well:
user {
battles : [ {"Name" : "The battle of foo", "unitsLost" : 54, "Id" : 34354 }, ... ]
}
here, unitsLost
is specific to the user and the battle, hence the data sits on the edge of the graph. Contrary to the battle's name, this data is not denormalized.
Option 6 Linker collections. Of course, such 'edge-data' can grow huge and might even call for a separate collection (linker collection). This fully eliminates the problem of access locks:
user {
"_id" : 3443
}
userBattles {
userId : 3443,
battleId : 4354,
unitsLost : 43,
itemsWon : [ <some list > ],
// much more data
}
Which of these is best depends on a lot of details of your application. If users make a lot of clicks (i.e. you have a fine-grained interface), it makes sense to split up objects like in option 4 or 6. If you really need all data in one batch, partial denormalization doesn't help, so option 2 would be preferable. Keep in mind the multiple writer problem.
Option 2 is the way to go.
If you would do it in a RDB, at some point in time (when you have to start scaling horizontally), you would also need to start removing SQL joins and join data on application level.
Even 10gen recommends using "manual" reference ids: http://www.mongodb.org/display/DOCS/Database+References
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With