In exploring ways to do real-time analytics with MongoDB, there seems to be a fairly standard way to do sums, but nothing in terms of more complex aggregation. Some things that have helped...
The basic approach for doing sums is to atomically increment document keys for each new record that comes in, to cache common queries:
Stats.collection.update({"keys" => ["a", "b", "c"]}, {"$inc" => {"counter_1" => 1, "counter_2" => 1"}, "upsert" => true);
This doesn't work for aggregates other than sums though. My question is, can something like this be done for averages, min, and max in mongodb?
Say you have a document like this:
{
:date => "04/27/2011",
:page_views => 1000,
:user_birthdays => ["12/10/1980", "6/22/1971", ...] # 1000 total
}
Could you do some atomic or optimized/real-time operation that grouped the birthdays into something like this?
{
:date => "04/27/2011",
:page_views => 1000,
:user_birthdays => ["12/10/1980", "6/22/1971", ...], # 1000 total
:average_age => 27.8,
:age_rank => {
"0 to 20" => 180,
"20 to 30" => 720,
"30 to 40" => 100,
"40 to 50" => 0
}
}
...just like you can do Doc.collection.update({x => 1}, {"$push" => {"user_birthdays" => "12/10/1980"}})
to add something to an array, and not have to load the document in, can you do something like that to average/aggregate the array? Is there something along these lines that you use for real-time aggregation?
MapReduce is used to do this in batch-processing jobs, I'm looking for patterns for something like real-time map-reduce for:
Could you do some atomic or optimized/real-time operation that grouped the birthdays into something like this?
It looks like you've added two fields age_rank
, average_age
. These are effectively calculated fields based on the data you already have. If I gave you the document with page views and user birthdays, it should be really trivial for the client code to find min/max, average, etc.
It seems to me that you're asking for MongoDB to perform the aggregation for you server-side. But you're adding the limitation that you don't want to use Map/Reduce?
If I'm understanding your question correctly, you're looking for something where you can say "add this item to an array and have all dependent items update themselves"? You don't want readers to perform any logic, you want everything to happen "magically" on the server side.
So there are three different ways to tackle this, but only one of them is currently available:
Unfortunately, your only option right now is #1. Fortunately, I know of several people that are using option #1 successfully.
There is work planned for the upcoming 1.9.x unstable release that may have aggregations.
See: https://jira.mongodb.org/browse/SERVER-447
Of course, it may get bumepd to a later release/
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With