Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

MongoDB Array or Separate collection

Tags:

mongodb

I have a users collection. Each user may have: - large number of followers (100K+) and might be following a large number of other users. - large list of favorites - large list of items viewed

I see 2 design for schema. Regarding the queries, I need to find the people a user following I also need to know the favorites, watch list of a given user. All list (followers, following, favorites must have unique entries

I tried to find similar questions or topics by Google but can't find anything.

Can MongoDB handle large array like these or I should go with design approach 2 where store the mapping in separate collections which allow me to have unlimited # of mappings?

I would appreciate your valuable thought.

I am going with option 2 as it allows me to have unlimited number of mappings. But before I go that route, I want to check if there will be issues that I may not want.

Moving from one design to the other will be expensive.

Design 1 (EMBEDDED ARRAY TO STORE MAPPINGS):
[
{
  user: bob, //(key)
  followers: ["Alex", "john", "steve", "mark", ... 200K+ entries]
  following: ["Mila", "mark", "Bill", "Joe", ... 100K+ entries]
  favorites: [ObjectI(1), ObjectId(2),...5K+ entries]
  watched: [ObjectI(4), ObjectId(5),...100K+ entries]
},
{
  user: Nick, //(key)
  followers: [bob", "kery", "Jery", "Tom", ... 200K+ entries]
  following: ["Tim", "Shane", "Sally", "Joe", ... 100K+ entries]
  favorites: [ObjectI(4), ObjectId(5),...5K+ entries]
  watched: [ObjectI(2), ObjectId(9),...100K + entries]
}
]

Design 2 (SEPARATE COLLECTIONS TO STORE MAPPINGS)

user_followers collection:
[
 { user: bob, follower: "Alex" }, //key: (user, follower)
 { user: bob, follower: "john"}, 
 { user: bob, follower: "steve"}, 
 { user: bob, follower: "mark"}
  ... 200K+ entries
]

user_following collection:
[
 { user: bob, following: "Mila"},  //key (user, following)
 { user: bob, following: "mark"},
 { user: bob, following: "Bill"}, 
 { user: bob, following: "Joe"},
 ... 100K+ entries
]

user_favorites collection:
[
 { user: bob, favorite: ObjectId(1)},
 { user: bob, favorite: ObjectId(3)},
 { user: bob, favorite: ObjectId(6)},
 ... 5k entries
},
like image 274
kheya Avatar asked Jun 10 '26 07:06

kheya


2 Answers

Can MongoDB handle large array like these or I should go with design approach 2 where store the mapping in separate collections which allow me to have unlimited # of mappings?

In MongoDB, a document can be at most 16 MB. With your first design, you'd risk reaching the limit I suppose.

Regarding the second design though, it seems to me the user_followers and user_following collections just duplicate the same data: if bob is following martha, then bob is a follower of martha, so you could merge those two collections into one with entries like { followed: 'martha', follower: 'bob' }

Update

There's been questions in the comments about how to handle bidirectionnal relationships, or indexes for query.

Given two users bob and martha, they can either have no relationship, or bob follows martha, or martha follows bob, or bob and martha follow each other, i.e. three different possible relations.

Now for the case where bob follows martha, the followers collection would be

[
  {
    followed: 'martha',
    follower: 'bob'
  }
]

For the case where martha follows bob, it would be

[
  {
    followed: 'bob',
    follower: 'martha'
  }
]

And when both follow each other

[
  {
    followed: 'martha',
    follower: 'bob'
  }, {
    followed: 'bob',
    follower: 'martha'
  }
]

The only operation that is expensive with this design was also expensive in Design 1 and 2, for the same reason: we need to isolate common elements between two collections; that operation is finding the bidirectional relationships (e.g. bob and martha follow eachother).

As far as indexes go, only two have any use what so ever, { follower: 1, followed: 1 } and { followed: 1, follower: 1 } (and having both is only useful for sorting, as any one of those two would cover all filtering cases).

Now to come back to design 2, the use cas above would have been:

bob follows martha

user_followers

[
  {
    user: 'martha',
    follower: 'bob'
  }
]

user_following

[
  {
    user: 'bob',
    following: 'martha'
  }
]

martha follows bob

user_followers

[
  {
    user: 'bob',
    follower: 'martha'
  }
]

user_following

[
  {
    user: 'martha',
    following: 'bob'
  }
]

bob and martha follow each other

user_followers

[
  {
    user: 'bob',
    follower: 'martha'
  }, {
    user: 'martha',
    follower: 'bob'
  }
]

user_following

[
  {
    user: 'martha',
    following: 'bob'
  }, {
    user: 'bob',
    following: 'martha'
  }
]

Now we can see that, as I indicated, Design 2 would duplicate all follower information with absolutely no benefit what so ever.

like image 116
Gorkk Avatar answered Jun 11 '26 22:06

Gorkk


On a first look I see here that design 1 is very likely to create documents that are too big for mongo and the 16MB size limit can be an issue.

Also, have you thought of your indexes at all? I think it will be too bad for performance if you have to search for a relation inside a huge array of eg users.following. I think it is wiser to do it like the 2nd design. With that you can have simple indexes that will perform very good.

PS: Is there really a reason for both a followers and a following collection? maybe you can combine them in one.

like image 45
xlembouras Avatar answered Jun 11 '26 23:06

xlembouras