I'm reading on Google App Engine groups many users (Fig1, Fig2, Fig3) that can't figure out where the high number of Datastore reads in their billing reports come from.
As you might know, Datastore reads are capped to 50K operations/day, above this budget you have to pay.
50K operations sounds like a lot of resources, but unluckily, it seems that each operation (Query, Entity fetch, Count..), hides several Datastore reads.
Is it possible to know via API or some other approach, how many Datastore reads are hidden behind the common RPC.get
, RPC.runquery
calls?
Appstats seems useless in this case because it gives just the RPC details and not the hidden reads cost.
Having a simple Model like this:
class Example(db.Model): foo = db.StringProperty() bars= db.ListProperty(str)
and 1000 entities in the datastore, I'm interested in the cost of these kind of operations:
items_count = Example.all(keys_only = True).filter('bars=','spam').count() items_count = Example.all().count(10000) items = Example.all().fetch(10000) items = Example.all().filter('bars=','spam').filter('bars=','fu').fetch(10000) items = Example.all().fetch(10000, offset=500) items = Example.all().filter('foo>=', filtr).filter('foo<', filtr+ u'\ufffd')
The Python Datastore API provides two classes for preparing and executing queries: Query uses method calls to prepare the query. GqlQuery uses a SQL-like query language called GQL to prepare the query from a query string.
For highly related or hierarchical data, Datastore allows entities to be stored in a parent/child relationship. This is known as an entity group or ancestor/descendent relationship. This is an example of an entity group with kinds of types person, pet, and toy. The 'Grandparent' in this relationship is the 'Person'.
See http://code.google.com/appengine/docs/billing.html#Billable_Resource_Unit_Cost . A query costs you 1 read plus 1 read for each entity returned. "Returned" includes entities skipped by offset or count. So that is 1001 reads for each of these:
Example.all(keys_only = True).filter('bars=','spam').count() Example.all().count(1000) Example.all().fetch(1000) Example.all().fetch(1000, offset=500)
For these, the number of reads charged is 1 plus the number of entities that match the filters:
Example.all().filter('bars=','spam').filter('bars=','fu').fetch() Example.all().filter('foo>=', filtr).filter('foo<', filtr+ u'\ufffd').fetch()
Instead of using count you should consider storing the count in the datastore, sharded if you need to update the count more than once a second. http://code.google.com/appengine/articles/sharding_counters.html
Whenever possible you should use cursors instead of an offset.
Just to make sure:
I'm almost sure:
Example.all().count(10000)
This one uses small datastore operations (no need to fetch the entities, only keys), so this would count as 1 read + 10,000 (max) small operations.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With