Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What's the best way to count results in GQL?

I figure one way to do a count is like this:

foo = db.GqlQuery("SELECT * FROM bar WHERE baz = 'baz')
my_count = foo.count()

What I don't like is my count will be limited to 1000 max and my query will probably be slow. Anyone out there with a workaround? I have one in mind, but it doesn't feel clean. If only GQL had a real COUNT Function...

like image 403
barneytron Avatar asked Jan 07 '09 19:01

barneytron


4 Answers

You have to flip your thinking when working with a scalable datastore like GAE to do your calculations up front. In this case that means you need to keep counters for each baz and increment them whenever you add a new bar, instead of counting at the time of display.

class CategoryCounter(db.Model):
    category = db.StringProperty()
    count = db.IntegerProperty(default=0)

then when creating a Bar object, increment the counter

def createNewBar(category_name):
  bar = Bar(...,baz=category_name)

  counter = CategoryCounter.filter('category =',category_name).get()
  if not counter:
    counter = CategoryCounter(category=category_name)
  else:
    counter.count += 1
  bar.put()
  counter.put()

db.run_in_transaction(createNewBar,'asdf')

now you have an easy way to get the count for any specific category

CategoryCounter.filter('category =',category_name).get().count
like image 176
Jehiah Avatar answered Oct 15 '22 17:10

Jehiah


+1 to Jehiah's response.

Official and blessed method on getting object counters on GAE is to build sharded counter. Despite heavily sounding name, this is pretty straightforward.

like image 43
zgoda Avatar answered Oct 15 '22 18:10

zgoda


Count functions in all databases are slow (eg, O(n)) - the GAE datastore just makes that more obvious. As Jehiah suggests, you need to store the computed count in an entity and refer to that if you want scalability.

This isn't unique to App Engine - other databases just hide it better, up until the point where you're trying to count tens of thousands of records with each request, and your page render time starts to increase exponentially...

like image 7
Nick Johnson Avatar answered Oct 15 '22 18:10

Nick Johnson


According to the GqlQuery.count() documentation, you can set the limit to be some number greater than 1000:

from models import Troll
troll_count = Troll.all(keys_only=True).count(limit=31337)

Sharded counters are the right way to keep track of numbers like this, as folks have said, but if you figure this out late in the game (like me) then you'll need to initialize the counters from an actual count of objects. But this is a great way to burn through your free quota of Datastore Small Operations (50,000 I think). Every time you run the code, it will use up as many ops as there are model objects.

like image 2
rescdsk Avatar answered Oct 15 '22 16:10

rescdsk