Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How would you design an AppEngine datastore for a social site like Twitter?

I'm wondering what would be the best way to design a social application where members make activities and follow other member's activities using Google AppEngine.

To be more specific lets assume we have these entities:

  • Users who have friends
  • Activities which represent actions made by users (lets say each has a string message and a ReferenceProperty to its owner user, or it can use parent association via appengine's key)

The hard part is following your friend's activities, which means aggregating the latest activities from all your friends. Normally, that would be a join between the Activities table and your friends list but thats not a viable design on appengine as there are no join simulating it will require firing up N queries (where N is number of friends) and then merging in memory - very expensive and will probably exceed request deadline...)

I'm currently thinking of implementing this using inbox queues where creation of a new Activity will fire a background process that will put the new activity's key in the "inbox" of every following user:

  • Getting "All the users who follow X" is a possible appengine query
  • Not a very expensive batch input into a new "Inbox" entity that basically stores (User, Activity Key) tuples.

I'll be happy to heard thought on this design or alternative suggestions etc.

like image 405
Eran Kampf Avatar asked Oct 27 '09 11:10

Eran Kampf


1 Answers

Take a look at Building Scalable, Complex Apps on App Engine (pdf), a fascinating talk given at Google I/O by Brett Slatkin. He addresses the problem of building a scalable messaging service like Twitter.

Here's his solution using a list property:

class Message(db.Model):     sender = db.StringProperty()     body = db.TextProperty()  class MessageIndex(db.Model):     #parent = a message     receivers = db.StringListProperty()  indexes = MessageIndex.all(keys_only = True).filter('receivers = ', user_id) keys = [k.parent() for k in indexes) messages = db.get(keys) 

This key only query finds the message indices with a receiver equal to the one you specified without deserializing and serializing the list of receivers. Then you use these indices to only grab the messages that you want.

Here's the wrong way to do it:

class Message(db.Model):     sender = db.StringProperty()     receivers = db.StringListProperty()     body = db.TextProperty()  messages = Message.all().filter('receivers =', user_id) 

This is inefficient because queries have to unpackage all of the results returned by your query. So if you returned 100 messages with 1,000 users in each receivers list you'd have to deserialize 100,000 (100 x 1000) list property values. Way too expensive in datastore latency and cpu.

I was pretty confused by all of this at first, so I wrote up a short tutorial about using the list property. Enjoy :)

like image 58
wings Avatar answered Sep 30 '22 17:09

wings