'AVG' and 'SUM' functionality in MongoDB, any tips?

Tags:

I'm a relative newbie to MongoDB, but from what I've read there are various methods to going about finding averages and sums of values in a MongoDB database, with various benefits and drawbacks for each.

I'm primarily asking for a method of finding the sum of a selection of values, and the average of a selection of values, in an as efficient (fast) method possible.

The documents in the collection being queried resemble this structure (with a lot of other fields):

{
    "_id": ObjectId('4e650107580fd649e5000005'),
    "date_added": ISODate("2011-09-05T00:00:00Z"),
    "value": 1500
}

Precalculating things like sums is, in my application, not always possible, because the selection of values to be summed can change (based on date ranges - e.g. between a start date and an end date, what is the average). This is a similar problem with precalculating averages.

From what I've read, MapReduce is definitely not ideal for real-time (i.e. on demand) lookup, so that seems to be out of the question too.

At the moment I'm querying the collection in this way: (note: this is using pymongo)

response = request.db['somecollection'].find(
    {
        'date_added': {
            '$gte': date_start,
            '$lte': date_end
        }
    },
    {
        'value':1
    }
).limit(500)

Then doing the calculation in Python using a for loop over the response. The limit of 500 results is arbitrary, to keep it from become too slow. I'm only retrieving the value, and none of the other fields.

Is this the most efficient method of doing this calculcation, or are there other methods to accomplish what I need?

Caveats:

I can't use the group function because I will probably be using sharding in the future
I can't use MapReduce because it's a function which will be used on-the-fly by users
I can't precalculate a lot of my sums/averages because the selection of values to sum/average is almost always different
I have looked around stackoverflow and the web to try and find recommendation on how to do this kind of thing, and it's fairly open-ended

EDIT:

I should point out that the number of documents returned from the query I posted above could be anything from 1 document to hundreds, but will probably have a maximum number of returned documents of about 150 (average of about 60 or 70)

235

asked Sep 06 '11 09:09

johneth

1 Answers

Give map-reduce a try, it's probably not as slow as you think. I've used it for real-time aggregation over some large data sets, and although it's sometimes not lightning fast, it's more often fine. It's best if you can filter down the size of the initial data you're aggregating, e.g.:

db.collection.mapReduce(m, r, { query : { year: 2011 } });

If you need to speed things up even more, consider distributing the data over a sharded cluster. Then the map-reduce processing can be scaled out across multiple shards running in parallel.

answered Oct 05 '22 05:10

Chris Fulstow

Related questions
                            
                                how to get data in batches in mongodb
                            
                                How to use mongo/mongoose to connect to a remote database
                            
                                How to search for users by both first and last name with MongoDB?
                            
                                Finding a distinct set of fields in MongoDB
                            
                                Node.js detect when two mongoose find are finished
                            
                                ValidationError: User validation failed at MongooseError.ValidationError
                            
                                Storing site config as Mongoose model
                            
                                JavaScript - How to save a date in MongoDB document in ISODate format?
                            
                                MongoDB regex matching trouble
                            
                                update with upsert: true is not updating in express,mongoose?
                            
                                MongoDB $lookup returns empty array [duplicate]
                            
                                Connecting Mongoose on Google App Engine
                            
                                Incorporating MongoDB views into Node
                            
                                Tag-based search model in Mongodb
                            
                                Unable to download embedded MongoDB, behind proxy, using automatic configuration script
                            
                                MongoDB Atlas Error while performing transaction on multiple collections (code 8000)
                            
                                Multi-Document Transactions not Working using MongoDB Atlas
                            
                                Data object storage - Can table JOIN's do what single table SELECT's cannot?
                            
                                Document updates using mongo-ruby-driver?
                            
                                How to make a custom query using django-nonrel and mongodb

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

'AVG' and 'SUM' functionality in MongoDB, any tips?

Tags:

mongodb

nosql

johneth

People also ask

1 Answers

Chris Fulstow

Recent Activity

Donate For Us