Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Highly scalable tags on Google App Engine (Python)

I have a lot of (e.g.) posts, that marked with one or more tags. Post can be created or deleted, and also user can make search request for one or more tags (combined with logical AND). First idea that came to my mind was a simple model

class Post(db.Model):
  #blahblah
  tags = db.StringListProperty()

Implementation of create and delete operations is obvious. Search is more complex. To search for N tags it will do N GQL queries like "SELECT * FROM Post WHERE tags = :1" and merge the results using the cursors, and it has terrible performance.

Second idea is to separate tags in different entities

class Post(db.Model):
    #blahblah
    tags = db.ListProperty(db.Key) # For fast access

class Tag(db.Model):
    name = db.StringProperty(name="key")
    posts = db.ListProperty(db.Key) # List of posts that marked with tag

It takes Tags from db by key (much faster than take it by GQL) and merge it in memory, I think this implementation has a better performance than the first one, but very frequently usable tags can exceed maximal size that allowed for single datastore object. And there is another problem: datastore can modify one single object only ~1/sec, so for frequently usable tags we also have a bottleneck with modify latency.

Any suggestions?

like image 560
gordon-quad Avatar asked Nov 14 '22 05:11

gordon-quad


1 Answers

To further Nick's questioning. If it is a logical AND using multiple tags in they query. Use tags = tag1 AND tags = tag2 ... set membership in a single query is one of datastore's shining features. You can achieve your result in one query.

http://code.google.com/appengine/docs/python/datastore/queriesandindexes.html#Properties_With_Multiple_Values

like image 115
kevpie Avatar answered Nov 23 '22 23:11

kevpie