I've been wondering about the ideal document structure for maximum query efficiency for various situations and there's one I want to ask about. It's really borne out of me not really knowing how MongoDB behaves in memory in this specific kind of case. Let me give you a hypothetical scenario.
Imagine a Twitter-style system of Followers and Followees. After an admittedly cursory glance, the main options appear to be:
In each user document, a "followers" array containing references to all the documents of other users they follow. Followees are found by finding our current user in other users' "user.followers" array. The main downside would appear to be the potential query overhead of the Followee search. Also, for a query specifically for the contents of "user.followers", does MongoDB just access the required field in users' documents, or is the whole user document found and then the required field values looked up from there and is this cached/stored in such a way that a query over a large user base would require significantly more memory?
In each user document, storing both "followers" and "followees" for quicker access to each. This obviously has the downside of duplicate data in the sense that an entry for user A following user B exists in both user documents in the respective field, and deletion from from requires a matching deletion in the other. Technically, this could be considering doubling number of points of potential failure for a simple deletion. And does MongoDB still suffer from what I've heard described as "swiss cheesing" of it's memory-stored data when deletions occur, and so removals from the 2 fields rather than 1 doubles the effect of that memory hole problem?
A separate collection for storing users' Followers, queried in a similar fashion to the user documents in 1- except that obviously the only data being accessed is Followers so if the user documents contain quite a lot of other data relevant to each user, we avoid accessing that data. This seems to have something of a relational database feel to it though and while I know that's not always a terrible approach just on principle, obviously if one of the other approaches mentioned (or one I haven't considered) is better under Mongo's architecture I'd love to learn!
If anyone has any thoughts on this, or wants to tell me I've missed a very relevant and and obvious docs page somewhere, or even wants to tell me that I'm just being stupid (thought with an explanation of why, please ;) ) I'd love to hear from you!
Performance. Because the index contains all fields required by the query, MongoDB can both match the query conditions and return the results using only the index. Querying only the index can be much faster than querying documents outside of the index.
MongoDB allows various ways to use tree data structures to model large hierarchical or nested data relationships. Presents a data model that organizes documents in a tree-like structure by storing references to "parent" nodes in "child" nodes.
This is a classic follower-followee problem and there's no one answer to it..Check out this link:
mongo db design of following and feeds, where should I embed?
Actually this situation lends itself very well to a relational schema, if MongoDB and SQL server were the only choices you had. But this is a special type of relational problem wherein you have a two-way relationship. This can perhaps be better handled by a graph database:
http://forum.kohanaframework.org/discussion/10130/followers-and-following-database-design-like-twitter/p1
The thing is, you could either keep followers or followees in a User document, but not both, for avoiding double deletion issues. So if you must stick to MongoDB, one way out could be..(assuming people don't follow/unfollow anyone that frequently),
Keep just the followees in the document, because when I view my profile, I'd be interested in the people I follow.. (that's the reason I followed them in the first place, right?)..And then do a query like:db.Users.find({ user_id : { $in : followees })
This will tell who all are following me (say my id is 'user_id').
Another reason why I don't suggest the other way round is that.. one may follow at the most 30-40 people, so User document storing 30-40 followees should be okay as against a User document storing thousands of followers! With the followee-in-document approach, you get an roughly even sized User documents throughout..In the follower-in-document approach, you will have some very small but some very bulky documents as well. And depending upon the amount of follower-data you put in (if any, apart from follower_id), you might want to be careful about the document size limit.
Given that its a many to many relationship, option (2) look good to me. As for the matching deletions, its usually not an issue, as long as you have some sort of reconciliation mechanism between the two documents.
Fragmentation generally depends on the application's access patterns and is generally an issue with most data systems. Some significant changes have been made to mongo to avoid internal fragmentation. Further, there are offline compaction alternatives to fix fragmentation, if it happens.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With