Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Most Efficient One-To-Many Relationships in Google App Engine Datastore?

Sorry if this question is too simple; I'm only entering 9th grade.

I'm trying to learn about NoSQL database design. I want to design a Google Datastore model that minimizes the number of read/writes.

Here is a toy example for a blog post and comments in a one-to-many relationship. Which is more efficient - storing all of the comments in a StructuredProperty or using a KeyProperty in the Comment model?

Again, the objective is to minimize the number of read/writes to the datastore. You may make the following assumptions:

  • Comments will not be retrieved independently of their respective blog post. (I suspect that this makes the StructuredProperty most preferable.)
  • Comments will need to be sortable by date, rating, author, etc. (Subproperties in the datastore cannot be indexed, so perhaps this could affect performance?)
  • Both blog posts and comments may be edited (or even deleted) after they are created.

Using StructuredProperty:

from google.appengine.ext import ndb

class Comment(ndb.Model):
    various properties...

class BlogPost(ndb.Model):
    comments = ndb.StructuredProperty(Comment, repeated=True)
    various other properties...

Using KeyProperty:

from google.appengine.ext import ndb

class BlogPost(ndb.Model):
    various properties...

class Comment(ndb.Model):
    blogPost = ndb.KeyProperty(kind=BlogPost)
    various other properties...

Feel free to bring up any other considerations that relate to efficiently representing a one-to-many relationship with regards to minimizing the number of read/writes to the datastore.

Thanks.

like image 671
user1566851 Avatar asked Jul 31 '12 20:07

user1566851


People also ask

Which of the following is the new version of Google Cloud Datastore that has certain improvements over the existing one?

Google has released Firestore, a new version of Datastore with several improvements and additional features. Existing Datastore users can access these features by cheating a database using “Firestore in Datastore” mode.

What is the Datastore used by Google App Engine?

Datastore is a NoSQL document database built for automatic scaling, high performance, and ease of application development. Datastore features include: Atomic transactions.

Which of the following is the new version of Google Cloud Datastore that has certain?

Cloud Firestore is the new version of Cloud Datastore and includes a backwards-compatible Datastore mode. If you intend to use the Cloud Datastore API in a new project, use Cloud Firestore in Datastore mode.

What type of database is Google Datastore?

Datastore is a highly scalable NoSQL database for your web and mobile applications.


2 Answers

I could be wrong, but from what I understand, a StructuredProperty is just a property within an entity, but with sub-properties.

This means reading a BlogPost and all its comments would only cost one read. So when you render your page, you only need one read op for your entire page.

Writes would be cheaper each too. You'll need one read op to get the BlogPost, and as long as you don't update any indexed properties, it'll just be one write op.

You can handle the comment sorting on your own after you read the entity out of the datastore.

You'll have to synchronize your comment updates/edits with transactions, to make sure one comment doesn't overwrite another, since they are both modifying the same entity. You may run into unsolveable problems if everyone is commenting and editing the same blog post at the same time.

In optimizing for cost though, you'll hit a wall with the maximum entity size of 1MB. This will limit the number of comments you can store per blog post.

Going with the KeyProperty would be quite a bit more expensive.

You'll need one read to get the blog post, plus 1 query plus 1 small read op for each comment.

Every comment is a new entity, so it'll be at least 4 write ops. You may want to index for sort order, so that'll end up costing even more write ops.

On the plus side, you'll have unlimited comments per blog post, you don't have to worry about synchronizing new comments. You might need to worry about synchronization for editing comments, but if you limit the edit to the creator, that shouldn't really be a problem. You don't have to do sorting yourself either.

It's a cost vs features tradeoff.

like image 113
dragonx Avatar answered Nov 10 '22 01:11

dragonx


What about:

from google.appengine.ext import ndb

class Comment(ndb.Model):
    various properties...

class BlogPost(ndb.Model):
    comments = ndb.KeyProperty(Comment, repeated=True)
    various other properties...

This way, you can store up to 5000 comments per blog post (the maximum number of repeated properties) independent of the size of each blog post. You won't need a query to fetch the blogs for a comment, you can just do ndb.get_multi(blog_post.comments). And for this operation, you can try to rely on ndb's memcache. Of course, it depends on your use case whether this is a good assumption or not.

like image 38
Marc Avatar answered Nov 10 '22 01:11

Marc