'schema' design for a social network

Tags:

I'm working on a proof of concept app for a twitter style social network with about 500k users. I'm unsure of how best to design the 'schema'

should I embed a user's subscriptions or have a separate 'subscriptions' collection and use db references? If I embed, I still have to perform a query to get all of a user's followers. e.g.

Given the following user:

{
 "username" : "alan",
 "photo": "123.jpg",
 "subscriptions" : [
    {"username" : "john", "status" : "accepted"},
    {"username" : "paul", "status" : "pending"}
  ]
}

to find all of alan's subscribers, I'd have to run something like this:

db.users.find({'subscriptions.username' : 'alan'});

from a performance point of view, is that any worse or better than having a separate subscriptions collection?

also, when displaying a list of subscriptions/subscribers, I am currently having problems with n+1 because the subscription document tells me the username of the target user but not other attributes I may need such as the profile photo. Are there any recommended practices for such situations?

thanks Alan

379

asked May 15 '10 10:05

Alan B

2 Answers

First off, you should know the tradeoffs you are going to get with MongoDB and any other NoSQL database (but realize that I am a fan of it). If you are trying to normalize your data completely, you are making a big mistake. Even in relational databases, the larger your app gets, the more your data gets denormalized (see this post by Hot Potato). I've seen this time and time again. You should not go nuts and make a huge mess, but don't worry about repeating information in two places. One of the major points (in my opinion) of NoSQL is that your schema moves into your code and not solely into the database.

Now, to answer your question, I think your initial strategy is what I would do. MongoDB can place indexes on elements which are arrays, so that will make things a lot faster if you are looking for how many friendships a user has. But in reality, the only way to really be sure is to run some sort of test program that generates a database full of names and relationships.

You can script up some input in Python or Perl or whatever you like, and use a file of names to generate some relationships. Check out the Census website, which has a list of last names. Download the file dist.all.last and write some program like:

#! /usr/bin/env python
import random as rand

f = open('dist.all.last')
names = []
for line in f:
  names.append(line.split()[0])

rels = {}
for name in names:
  numOfFriends = rand.randint(0, 1000)
  rels[name] = []
  for i in range(numOfFriends):
    newFriend = rand.choice(names)
    if newFriend != name: #cannot be friends with yourself
      rels[name].append(newFriend)

# take relationships (i.e. rels) and write them to MongoDB

Also, as a general note, your fieldnames seem kind of long. Remember that the fieldnames are repeated with every document in that collection because you cannot rely on one field being in any other document. To save space, a general strategy is to use shorter fieldnames like "unam" instead of "username", but that's a small thing. See the great advice in these two posts.

EDIT:

Actually, in pondering your problem a little more, I would make one more suggestion: break up the subscription types into different fields to make the indexes more efficient. For example, instead of:

{
 "username" : "alan",
 "photo": "123.jpg",
 "subscriptions" : [
    {"username" : "john", "status" : "accepted"},
    {"username" : "paul", "status" : "pending"}
  ]
}

As you said above, I would do this:

{
 "username" : "alan",
 "photo": "123.jpg",
 "acc_subs" : [ "john" ],
 "pnd_subs" : [ "paul" ]
}

So that you could have an index for each type of subscription, thus making queries like "Hoy many people have Paul as pending?" and "How many people subscribe to Paul?" super fast either way. Mongo's indexing over array'd values is truly an epic win.

answered Sep 24 '22 00:09

daveslab

@Alan B: I think that you're totally getting MongoDB. I agree with @daveslab version of the data, but you'll probably want to add "followers" too.

{
 "username" : "alan",
 "photo": "123.jpg",
 "acc_subs" : [ "john" ],
 "pnd_subs" : [ "paul" ]
 "acc_fol" : [ "mike", "ray" ],
 "pnd_fol" : [ "judy" ]
}

Yes it's duplicate information. It's up to the "business layer" to ensure that this data is correctly update in both spots. Unfortunately there are no transactions in Mongo, fortunately, you have the $addToSet operation, so you're pretty safe.

answered Sep 22 '22 00:09

Gates VP

Related questions
                            
                                Redirecting before POST upload has been completed
                            
                                Open local KML File in Google Maps on Android
                            
                                Shift count negative or too big error - correct solution?
                            
                                Is the Single-Element Enum Type Singleton really a widely adopted good idea?
                            
                                Strict JSON parsing with Google's Gson?
                            
                                Java equivalent of Python's struct.pack?
                            
                                What is the proper way to handle multiple datagrids in a tab control so that cells leave edit mode when the tabs are changed?
                            
                                Install Visual Studio 2008 AFTER 2010?
                            
                                Should the model or controller be responsible for sending emails?
                            
                                Need a standalone Java library for performing spatial calculations on lat/lon data [closed]
                            
                                Why do we need Application Server in Java
                            
                                What JAXB needs a public no-arg constructor for?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With