Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

parent->child relationships in appengine python (bigtable)

I'm still learning my lessons about data modeling in bigtable/nosql and would appreciate some feedback. Would it be fair to say that I should avoid parent->child relationships in my data modeling if I frequently need to deal with the children in aggregate across parents?

As an example, let's say I'm building a blog that will be contributed to by a number of authors, and each other has posts, and each post has tags. So I could potentially set up something like this:

class Author(db.Model): 
  owner = db.UserProperty()

class Post(db.Model): 
  owner = db.ReferenceProperty(Author, 
    collection_name='posts') 
  tags = db.StringListProperty() 

As I understand this will create an entity group based on the Author parent. Does this cause inefficiency if I mostly need to query for Posts by tags which I expect to cut across multiple Authors?

I understand doing a query on list properties can be inefficient. Let's say each post has about 3 tags on average, but could go all the way up to 7. And I expect my collection of possible tags to be in the low hundreds. Is there any benefit to altering that model to something like this?

class Author(db.Model): 
  owner = db.UserProperty()

class Post(db.Model): 
  owner = db.ReferenceProperty(Author, 
    collection_name='posts') 
  tags = db.ListProperty(db.Key)

class Tag(db.Model): 
  name = db.StringProperty() 

Or would I be better off doing something like this?

class Author(db.Model): 
  owner = db.UserProperty()

class Post(db.Model): 
  owner = db.ReferenceProperty(Author, 
    collection_name='posts')

class Tag(db.Model): 
  name = db.StringProperty() 

class PostTag(db.Model): 
  post = db.ReferenceProperty(Post, 
    collection_name='posts') 
  tag = db.ReferenceProperty(Tag, 
    collection_name='tags') 

And last question... what if my most common use case will be querying for posts by multiple tags. E.g., "find all posts with tags in {'apples', 'oranges', 'cucumbers', 'bicycles'}" Is one of these approaches more appropriate for a query that looks for posts that have any of a collection of tags?

Thanks, I know that was a mouthful. :-)

like image 816
Bob Ralian Avatar asked Feb 28 '11 06:02

Bob Ralian


1 Answers

Something like the first or second approach are well suited for App Engine. Consider the following setup:

class Author(db.Model): 
  owner = db.UserProperty()

class Post(db.Model): 
  author = db.ReferenceProperty(Author, 
    collection_name='posts') 
  tags = db.StringListProperty()

class Tag(db.Model): 
  post_count = db.IntegerProperty()

If you use the string tag (case-normalized) as the Tag entity key_name, you can efficiently query for posts with a specific tag, or list the tags of a post, or fetch tag statistics:

post = Post(author=some_author, tags=['app-engine', 'google', 'python'])
post_key = post.put()
# call some method to increment post counts...
increment_tag_post_counts(post_key)

# get posts with a given tag:
matching_posts = Post.all().filter('tags =', 'google').fetch(100)
# or, two tags:
matching_posts = Post.all().filter('tags =', 'google').filter('tags =', 'python').fetch(100)

# get tag list from a post:
tag_stats = Tag.get_by_key_name(post.tags)

The third approach requires additional queries or fetches for most basic operations, and it is more difficult if you want to query for multiple tags.

like image 76
Robert Kluin Avatar answered Sep 22 '22 06:09

Robert Kluin