I have a users collection. Each user may have: - large number of followers (100K+) and might be following a large number of other users. - large list of favorites - large list of items viewed
I see 2 design for schema. Regarding the queries, I need to find the people a user following I also need to know the favorites, watch list of a given user. All list (followers, following, favorites must have unique entries
I tried to find similar questions or topics by Google but can't find anything.
Can MongoDB handle large array like these or I should go with design approach 2 where store the mapping in separate collections which allow me to have unlimited # of mappings?
I would appreciate your valuable thought.
I am going with option 2 as it allows me to have unlimited number of mappings. But before I go that route, I want to check if there will be issues that I may not want.
Moving from one design to the other will be expensive.
Design 1 (EMBEDDED ARRAY TO STORE MAPPINGS):
[
{
user: bob, //(key)
followers: ["Alex", "john", "steve", "mark", ... 200K+ entries]
following: ["Mila", "mark", "Bill", "Joe", ... 100K+ entries]
favorites: [ObjectI(1), ObjectId(2),...5K+ entries]
watched: [ObjectI(4), ObjectId(5),...100K+ entries]
},
{
user: Nick, //(key)
followers: [bob", "kery", "Jery", "Tom", ... 200K+ entries]
following: ["Tim", "Shane", "Sally", "Joe", ... 100K+ entries]
favorites: [ObjectI(4), ObjectId(5),...5K+ entries]
watched: [ObjectI(2), ObjectId(9),...100K + entries]
}
]
Design 2 (SEPARATE COLLECTIONS TO STORE MAPPINGS)
user_followers collection:
[
{ user: bob, follower: "Alex" }, //key: (user, follower)
{ user: bob, follower: "john"},
{ user: bob, follower: "steve"},
{ user: bob, follower: "mark"}
... 200K+ entries
]
user_following collection:
[
{ user: bob, following: "Mila"}, //key (user, following)
{ user: bob, following: "mark"},
{ user: bob, following: "Bill"},
{ user: bob, following: "Joe"},
... 100K+ entries
]
user_favorites collection:
[
{ user: bob, favorite: ObjectId(1)},
{ user: bob, favorite: ObjectId(3)},
{ user: bob, favorite: ObjectId(6)},
... 5k entries
},
Can MongoDB handle large array like these or I should go with design approach 2 where store the mapping in separate collections which allow me to have unlimited # of mappings?
In MongoDB, a document can be at most 16 MB. With your first design, you'd risk reaching the limit I suppose.
Regarding the second design though, it seems to me the user_followers and user_following collections just duplicate the same data: if bob is following martha, then bob is a follower of martha, so you could merge those two collections into one with entries like { followed: 'martha', follower: 'bob' }
Update
There's been questions in the comments about how to handle bidirectionnal relationships, or indexes for query.
Given two users bob and martha, they can either have no relationship, or bob follows martha, or martha follows bob, or bob and martha follow each other, i.e. three different possible relations.
Now for the case where bob follows martha, the followers collection would be
[
{
followed: 'martha',
follower: 'bob'
}
]
For the case where martha follows bob, it would be
[
{
followed: 'bob',
follower: 'martha'
}
]
And when both follow each other
[
{
followed: 'martha',
follower: 'bob'
}, {
followed: 'bob',
follower: 'martha'
}
]
The only operation that is expensive with this design was also expensive in Design 1 and 2, for the same reason: we need to isolate common elements between two collections; that operation is finding the bidirectional relationships (e.g. bob and martha follow eachother).
As far as indexes go, only two have any use what so ever, { follower: 1, followed: 1 } and { followed: 1, follower: 1 } (and having both is only useful for sorting, as any one of those two would cover all filtering cases).
Now to come back to design 2, the use cas above would have been:
bob follows martha
user_followers
[
{
user: 'martha',
follower: 'bob'
}
]
user_following
[
{
user: 'bob',
following: 'martha'
}
]
martha follows bob
user_followers
[
{
user: 'bob',
follower: 'martha'
}
]
user_following
[
{
user: 'martha',
following: 'bob'
}
]
bob and martha follow each other
user_followers
[
{
user: 'bob',
follower: 'martha'
}, {
user: 'martha',
follower: 'bob'
}
]
user_following
[
{
user: 'martha',
following: 'bob'
}, {
user: 'bob',
following: 'martha'
}
]
Now we can see that, as I indicated, Design 2 would duplicate all follower information with absolutely no benefit what so ever.
On a first look I see here that design 1 is very likely to create documents that are too big for mongo and the 16MB size limit can be an issue.
Also, have you thought of your indexes at all? I think it will be too bad for performance if you have to search for a relation inside a huge array of eg users.following. I think it is wiser to do it like the 2nd design. With that you can have simple indexes that will perform very good.
PS: Is there really a reason for both a followers and a following collection? maybe you can combine them in one.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With