Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Twitter-like model with RavenDB

I am playing around a bit with Raven and trying to figure out what the best way would be to model my objects for a twitter-like scenario. So far I have come up with a few options but not sure which one is the best.

public class User{
    public string Id{get;set;}
    public List<string> Following{get;set;}
    public List<string> Followers{get;set;}
}

The User object is simple and straightforward, just an ID and a list of IDs for people I follow and people following me. The feed setup is where I need help, getting all posts from users that I am following.

Option 1 - The easy route

This searches for all posts of people I follow just based on their UserId.

public class Post{
    public string UserId{get;set;}
    public string Content{get;set;}
}

Index

public class Posts : AbstractIndexCreationTask<Post>{
    public Posts(){
        Map = results => from r in results
                         select new{
                             r.UserId
                         };
    }
}

Querying

var posts = session.Query<Post,Posts>().Where(c=>c.UserId.In(peopleImFollowing));

This is the obvious route but it smells bad. The query results in a bunch of OR statements sent to Lucene. There is an upper limit of somewhere around 1024 that Raven will handle, so any one user couldn't follow more than 1000 people.

Option 2 - One post for each follower

public class Post{
    public string UserId{get;set;}
    public string RecipientId{get;set;}
    public string Content{get;set;}
}

Adding a new post

foreach(string followerId in me.Followers){
   session.Store(new Post{
    UserId = me.UserId,
    RecipientId = followerId,
    Content = "foobar" });
}

This is simple to follow and easy to query but it seems like there would be way too many documents created... perhaps that doesn't matter though?

Option 3 - List of recipients

So far I like this the best.

public class Post{
    public string UserId{get;set;}
    public List<string> Recipients{get;set;}
    public string Content{get;set;}
}

Index

public class Posts : AbstractIndexCreationTask<Post>{
    public Posts(){
        Map = results => from r in results
                         select new{
                             UserId = r.UserId,
                             Recipient = r.Recipients
                         }
    }
}

Adding new post

session.Store(new Post{
               UserId = me.Id,
               Recipients = me.Followers,
               Content = "foobar"
              });

Querying

var posts = session.Query<Post,Posts>().Where(c=>c.Recipient == me.Id);

This seems like the best way but I have never worked with Lucene before. Would it be a problem for the index if someone has 10,000 followers? What if we want to post a message that goes to every single user? Perhaps there is another approach?

like image 698
scott Avatar asked Dec 05 '25 05:12

scott


1 Answers

From my perspective, only option 1 really works and you will probably want to tune how RavenDB talks to lucene if you want to have support for following more than 1024 users.

Option 2 and Option 3 don't take into account that after you have followed new users you want older tweets of them to show up in your timeline. Likewise, you also want these tweets disappear from your timeline after you unfollowed them. If you want to implement this with one of those two approaches, you would need to duplicate all of their tweets on 'follow' operation and also delete them on 'unfollow'. This would make following/unfollowing a very expensive operation and it could also fail (what if the server that contains parts of the tweets isn't available the moment you're doing this?).

Option 2 also has the immensive disadvantage that it would produce literally tons of duplicate data. Think about famous users with millions of followers and thousands of posts. Then multiply this with thousands of famous users... not even twitter can handle such amounts of data.

Option 3 also has the problem that queries to the index get slow because every lucene document would have this 'recipient' field with perhaps millions of values. And you have trillions of documents... no, I'm not a lucene expert, but I don't think that works fast enough to display the timeline (even ignoring that you are not the only concurrent user that wants to display the timeline).

As I said above, I think that only option 1 works. Maybe someone else has a better approach. Good question btw.

like image 124
Daniel Lang Avatar answered Dec 08 '25 00:12

Daniel Lang