I have a lot of (e.g.) posts, that marked with one or more tags. Post can be created or deleted, and also user can make search request for one or more tags (combined with logical AND). First idea that came to my mind was a simple model
class Post(db.Model):
#blahblah
tags = db.StringListProperty()
Implementation of create and delete operations is obvious. Search is more complex. To search for N tags it will do N GQL queries like "SELECT * FROM Post WHERE tags = :1" and merge the results using the cursors, and it has terrible performance.
Second idea is to separate tags in different entities
class Post(db.Model):
#blahblah
tags = db.ListProperty(db.Key) # For fast access
class Tag(db.Model):
name = db.StringProperty(name="key")
posts = db.ListProperty(db.Key) # List of posts that marked with tag
It takes Tags from db by key (much faster than take it by GQL) and merge it in memory, I think this implementation has a better performance than the first one, but very frequently usable tags can exceed maximal size that allowed for single datastore object. And there is another problem: datastore can modify one single object only ~1/sec, so for frequently usable tags we also have a bottleneck with modify latency.
Any suggestions?
To further Nick's questioning. If it is a logical AND using multiple tags in they query. Use tags = tag1 AND tags = tag2 ... set membership in a single query is one of datastore's shining features. You can achieve your result in one query.
http://code.google.com/appengine/docs/python/datastore/queriesandindexes.html#Properties_With_Multiple_Values
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With