Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How many Datastore reads consume each Fetch, Count and Query operations?

I'm reading on Google App Engine groups many users (Fig1, Fig2, Fig3) that can't figure out where the high number of Datastore reads in their billing reports come from.
As you might know, Datastore reads are capped to 50K operations/day, above this budget you have to pay.

50K operations sounds like a lot of resources, but unluckily, it seems that each operation (Query, Entity fetch, Count..), hides several Datastore reads.

Is it possible to know via API or some other approach, how many Datastore reads are hidden behind the common RPC.get , RPC.runquery calls?

Appstats seems useless in this case because it gives just the RPC details and not the hidden reads cost.

Having a simple Model like this:

class Example(db.Model):     foo = db.StringProperty()         bars= db.ListProperty(str) 

and 1000 entities in the datastore, I'm interested in the cost of these kind of operations:

items_count =  Example.all(keys_only = True).filter('bars=','spam').count()  items_count = Example.all().count(10000)   items = Example.all().fetch(10000)  items = Example.all().filter('bars=','spam').filter('bars=','fu').fetch(10000)  items = Example.all().fetch(10000, offset=500)  items = Example.all().filter('foo>=', filtr).filter('foo<', filtr+ u'\ufffd') 
like image 390
systempuntoout Avatar asked Oct 18 '11 12:10

systempuntoout


People also ask

What is the query language we can use with Datastore?

The Python Datastore API provides two classes for preparing and executing queries: Query uses method calls to prepare the query. GqlQuery uses a SQL-like query language called GQL to prepare the query from a query string.

What is ancestor in Datastore?

For highly related or hierarchical data, Datastore allows entities to be stored in a parent/child relationship. This is known as an entity group or ancestor/descendent relationship. This is an example of an entity group with kinds of types person, pet, and toy. The 'Grandparent' in this relationship is the 'Person'.


2 Answers

See http://code.google.com/appengine/docs/billing.html#Billable_Resource_Unit_Cost . A query costs you 1 read plus 1 read for each entity returned. "Returned" includes entities skipped by offset or count. So that is 1001 reads for each of these:

Example.all(keys_only = True).filter('bars=','spam').count()  Example.all().count(1000) Example.all().fetch(1000) Example.all().fetch(1000, offset=500) 

For these, the number of reads charged is 1 plus the number of entities that match the filters:

Example.all().filter('bars=','spam').filter('bars=','fu').fetch() Example.all().filter('foo>=', filtr).filter('foo<', filtr+ u'\ufffd').fetch() 

Instead of using count you should consider storing the count in the datastore, sharded if you need to update the count more than once a second. http://code.google.com/appengine/articles/sharding_counters.html

Whenever possible you should use cursors instead of an offset.

like image 121
ribrdb Avatar answered Sep 20 '22 10:09

ribrdb


Just to make sure:

I'm almost sure:

Example.all().count(10000) 

This one uses small datastore operations (no need to fetch the entities, only keys), so this would count as 1 read + 10,000 (max) small operations.

like image 29
Barak Avatar answered Sep 19 '22 10:09

Barak