I am playing around a bit with Raven and trying to figure out what the best way would be to model my objects for a twitter-like scenario. So far I have come up with a few options but not sure which one is the best.
public class User{
public string Id{get;set;}
public List<string> Following{get;set;}
public List<string> Followers{get;set;}
}
The User object is simple and straightforward, just an ID and a list of IDs for people I follow and people following me. The feed setup is where I need help, getting all posts from users that I am following.
This searches for all posts of people I follow just based on their UserId.
public class Post{
public string UserId{get;set;}
public string Content{get;set;}
}
public class Posts : AbstractIndexCreationTask<Post>{
public Posts(){
Map = results => from r in results
select new{
r.UserId
};
}
}
var posts = session.Query<Post,Posts>().Where(c=>c.UserId.In(peopleImFollowing));
This is the obvious route but it smells bad. The query results in a bunch of OR statements sent to Lucene. There is an upper limit of somewhere around 1024 that Raven will handle, so any one user couldn't follow more than 1000 people.
public class Post{
public string UserId{get;set;}
public string RecipientId{get;set;}
public string Content{get;set;}
}
foreach(string followerId in me.Followers){
session.Store(new Post{
UserId = me.UserId,
RecipientId = followerId,
Content = "foobar" });
}
This is simple to follow and easy to query but it seems like there would be way too many documents created... perhaps that doesn't matter though?
So far I like this the best.
public class Post{
public string UserId{get;set;}
public List<string> Recipients{get;set;}
public string Content{get;set;}
}
public class Posts : AbstractIndexCreationTask<Post>{
public Posts(){
Map = results => from r in results
select new{
UserId = r.UserId,
Recipient = r.Recipients
}
}
}
session.Store(new Post{
UserId = me.Id,
Recipients = me.Followers,
Content = "foobar"
});
var posts = session.Query<Post,Posts>().Where(c=>c.Recipient == me.Id);
This seems like the best way but I have never worked with Lucene before. Would it be a problem for the index if someone has 10,000 followers? What if we want to post a message that goes to every single user? Perhaps there is another approach?
From my perspective, only option 1 really works and you will probably want to tune how RavenDB talks to lucene if you want to have support for following more than 1024 users.
Option 2 and Option 3 don't take into account that after you have followed new users you want older tweets of them to show up in your timeline. Likewise, you also want these tweets disappear from your timeline after you unfollowed them. If you want to implement this with one of those two approaches, you would need to duplicate all of their tweets on 'follow' operation and also delete them on 'unfollow'. This would make following/unfollowing a very expensive operation and it could also fail (what if the server that contains parts of the tweets isn't available the moment you're doing this?).
Option 2 also has the immensive disadvantage that it would produce literally tons of duplicate data. Think about famous users with millions of followers and thousands of posts. Then multiply this with thousands of famous users... not even twitter can handle such amounts of data.
Option 3 also has the problem that queries to the index get slow because every lucene document would have this 'recipient' field with perhaps millions of values. And you have trillions of documents... no, I'm not a lucene expert, but I don't think that works fast enough to display the timeline (even ignoring that you are not the only concurrent user that wants to display the timeline).
As I said above, I think that only option 1 works. Maybe someone else has a better approach. Good question btw.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With