Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I build a flexible counter with 1000+ rows but few reads in Google App Engine?

I have a list of users that only administrators can see (= few reads). This list also displays a count of the number of users in the datastore. Because the list could grow larger than 1000 my first thought was to avoid a normal count() and instead use a sharded counter.

However, the problem is that the admins also have access to various search filters (in the GUI), such as only viewing male/female users and so on. It's important that the count reflects these filters, so that they can get the number of female users, male users and a myriad of other combinations.

Because of this, sharded counters and high concurrency counters without sharding don't seem like a good idea, because I would need to create a counter for every combination of search filters.

Should I simply create a loop of count() methods, such as described here or is this very bad practice? How would I do it otherwise?

Note that this counter is for an admin interface and would have a very limited number of reads. This is really a case of when I would like to sacrifice some read performance for flexibility and accuracy. Although it should be able to grow beyond 1000, it's not expected to grow larger than 10 000.

like image 526
Aneon Avatar asked Oct 15 '22 03:10

Aneon


2 Answers

"Loop of counts" is slow, but these days you can make it a bit better with cursors. Normally I would recommend denormalizing into all the "filtered" counters you need, but that slows down user addition and deletion (and probably demographic changes as well), so, given your particular use case with a very low volume of reads, you can probably get away with the "loop of counts" approach (plus cursors;-).

like image 197
Alex Martelli Avatar answered Oct 31 '22 17:10

Alex Martelli


I've tried two approaches:

1) Write my own task that queries the data store (the query is a key descending query) with a fixed limit of entities (say 50). It then enqueues the next task to start querying where it left off. Each task enqueues the next one passing it two parameters (where it last left off like a cursor and a running total of the number of entities it has seen).

2) This approach is much easier - and that is to use the mapreduce library provided by google for appengine. It runs totally in user space so you just have to download and build the library and include it in your project. Basically, it will handle iterating through all the entities you specify and lets you write a handler for what to do with each one (like incrementing a counter). See the details here: mapreduce.appspot.com - they even have a sample app that does just what you are asking for. THe only problem with this is that the results will appear in your browser and not necessarily stored in the datastore unless you do that yourself.

like image 43
aloo Avatar answered Oct 31 '22 17:10

aloo