Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Counting Unique Users using Mapreduce for Java Appengine

I'm trying to count the number of unique users per day on my java appengine app. I have decided to use the mapreduce framework (mapreduce.appspot.com) for java appengine to do this calculation offline. I've managed to create a map reduce job that goes through all of my entities which represent a single users session event. I can use a simple counter as well. I have several questions though:

1) How do I only increment a counter once for each user id? I am currently mapping over entities which contain a user id property but many of these entities may contain the same user id so how do I only count it once?

2) Once I have these results of the job stored in these counters - how can I persist them to the datastore? I see the results of the counters on the mapreduce's status page but I want these results automatically persisted to the datastore.

Ideas?

like image 312
aloo Avatar asked Jun 28 '10 00:06

aloo


1 Answers

I haven't actually used the MapReduce functionality yet, but my theoretical understanding is that you can write things to the datastore from within your mapper. You could create an Entity type called something like UniqueCount, and insert one entity every time your mapper sees an ID that it hasn't seen before. then you can count how many unique ID's you have. In fact, you can just update a counter every time you find a new unique entity. You may want to google "sharded counter" for hints on creating a counter in the datastore that can handle high throughput.

Eventually, when they finish the Reduce functionality, I imagine this whole task will become pretty trivial.

like image 118
Peter Recore Avatar answered Oct 19 '22 03:10

Peter Recore