Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Many to many relationships with MongoDB at large scale

Tags:

mongodb

I've seen many posts on how to do many-to-many relationships with MongoDB, but none of them mention scale. For example these posts:

MongoDB Many-to-Many Association

How to organise a many to many relationship in MongoDB

The problem I can see with this kind of setup is MongoDB's 16MB document limit. Say I have users, groups, and posts. posts have an associated group and many users that can like it. A group has many posts in it, and many users that can follow it. A user can have many liked posts and can follow many groups. If I were to build this with a relational database I would set it up like this:

user:
    user_id
    username

post:
    post_id
    group_id
    message

group:
    group_id
    name

post_likes:
    post_id
    liked_user_id

group_followers:
    group_id
    follower_user_id

In theory, a group can have an ulimited number of posts and following users, a post can have an unlimited number of liked users, and a user can have an unlimited number of liked posts and groups that they are following if pagination is done correctly in the SQL queries.

How can I setup the schema of MongoDB so that this sort of scale can be achieved?

like image 843
Mike V Avatar asked Aug 08 '15 01:08

Mike V


People also ask

Can MongoDB have many-to-many relationship?

Many to Many relationships are a type of mongodb relationship in which any two entities within a document can have multiple relationships. In this relationship, we can consider a case of Online courses website where there are many courses and also many users.

Which process in MongoDB provides high scalability?

Horizontal scaling, also known as scale-out, refers to bringing on additional nodes to share the load. Scaling MongoDB horizontally is achieved primarily through sharding.

Can MongoDB handle millions of records?

Working with MongoDB and ElasticSearch is an accurate decision to process millions of records in real-time. These structures and concepts could be applied to larger datasets and will work extremely well too.

What are the limitations of MongoDB?

The maximum BSON document size is 16 megabytes. The maximum document size helps ensure that a single document cannot use excessive amount of RAM or, during transmission, excessive amount of bandwidth. To store documents larger than the maximum size, MongoDB provides the GridFS API.


1 Answers

This is a good question which illustrates the problems with overemebedding and how to deal with it.

Example: Post likes

Let's stick with the example of users liking posts, which is a simple example. The other relations would have to be handled accordingly.

You are absolutely right that with storing the likes inside the post would sooner or later lead to the problem that very popular posts would reach the size limit.

So you correctly fell back to create a post_likes collection. Why do I call this correct? Since it fits your use cases and functional and non-functional requirements!

  • It scales indefinetly (well, there is a theoretical limit, but it is humongous)
  • It is easy to maintain (create a unique index over post_id and liked_user_id) and use (both the user and the post are known, so adding a like is a simple insert or more likely an upsert)
  • You are able to easily find out which users like which post and which post is liked by which users

However I would expand the collection a bit to prevent unneeded queries for certain use cases which are frequent.

Let's assume for now that post titles and usernames can't be changed. In that case, the following data model could make more sense

{
  _id: new ObjectId(),
  "post_id": someValue,
  "post_title": "Cool thing",
  "liked_user_id": someUserId,
  "user_name": "JoeCool"
}

Now let's assume you want to display the username of all users that liked a post. With the model above, that would be a single, rather fast query:

db.post_likes.find(
  {"postId":someValue},
  {_id:0,user_name:1}
)

With only the IDs stored, this rather usual task would need at least two queries and - given the constraint that there can be an infinite number of likers for a post - potentially huge memory consumption (you'd need to store the user IDs in RAM).

Granted, this leads to some redundancy, but even when millions of people like a post, we are talking only of a few megabytes of relatively cheap (and easy to scale) disk space while gaining a lot of performance in terms of user experience.

Now here comes the thing: Even if the user names and post titles are subject to change, you only had to do a multi update:

db.post_likes.update(
  {"post_id":someId},
  { $set:{ "post_title":newTitle} },
  { multi: true}
)

You are trading that it takes a while to do some rather rare stuff like changing a username or a post for extreme speed for use cases which happen extremely often.

Bottom line

Keep in mind that MongoDB is a document oriented database. So document the events you are interested in with the values you need for future queries and model your data accordingly.

like image 50
Markus W Mahlberg Avatar answered Sep 28 '22 11:09

Markus W Mahlberg