How do I count multiple keys in the same MongoDB aggregation $group query?

Tags:

aggregation-framework

I have a query:

db.test.aggregate( {$group : { _id : '$key', frequency: { $sum : 1 } } } )

This will get the frequency of every enumeration of key in the test set. Basically, I have gotten the distribution of key.

Now imagine I want to get the distributions of key1, key2, and key3 (so three different distributions).

Obviously, I could run this query 3 times with each separate key, but it seems like we would be able to optimize the query by allowing it to count all 3 keys at the same time. I have been playing around with it and searching the whole of the inter-webs, but so far, I am consigned to running three separate aggregation queries or using a map/reduce function.

Does anyone have any other ideas?

802

asked May 16 '13 21:05

friendly_programmer

1 Answers

There are a few different approaches you could use here:

Use map/reduce: don't do this. Right now it would be much faster to run the aggregation framework 3 times than to use a map reduce function for this use case.
Run aggregation 3 times. This is not optimal, but if you don't have time constraints then this is the easiest option. If your aggregations are taking < a few seconds anyway then I wouldn't worry about optimizing until they become a problem.
Here's the best work-around I can think of. The $group operator allows you to build an _id on multiple fields. E.g. {"_id":{"a":"$key1", "b":"$key2", "c":"$key3"}}. Doing this creates a grouping for all existing combinations of your different keys. You could potentially group you keys this way and then manually sum across the results in the client.

Let me elaborate. Let's say we have a collection of shapes. These shapes can have a color, a size, and a kind (square, circle, etc). An aggregation on a multi-key Id could look like:

db.shapes.aggregate({$group:{_id:{"f1":"$f1", "f2":"$f2", "f3":"$f3"}, count:{"$sum":1}}})

and return:

"result" : [
        {
            "_id" : {
                "f1" : "yellow",
                "f2" : "medium",
                "f3" : "triangle"
            },
            "count" : 4086
        },
        {
            "_id" : {
                "f1" : "red",
                "f2" : "small",
                "f3" : "triangle"
            },
            "count" : 4138
        },
        {
            "_id" : {
                "f1" : "red",
                "f2" : "big",
                "f3" : "square"
            },
            "count" : 4113
        },
        {
            "_id" : {
                "f1" : "yellow",
                "f2" : "small",
                "f3" : "triangle"
            },
            "count" : 4145
        },
        {
            "_id" : {
                "f1" : "red",
                "f2" : "small",
                "f3" : "square"
            },
            "count" : 4062
        }

... and so on

You would then sum up the results client-side, over a drastically reduced number of entries. Assuming the number of unique values for each key is sufficiently small compared to the total number of documents, you could do this final step in a negligible amount of time.

107

answered Oct 19 '22 01:10

3rf

Related questions
                            
                                MongoDB: Can different databases be placed on separate drives?
                            
                                Handle lost connection to mongo db from nodejs
                            
                                Mongodb return array of strings instead array of objects
                            
                                Dynamic keys from values in MongoDB
                            
                                Pull an entry from an array via Meteor
                            
                                Best practice to validate Mongoose Schema and display custom error message
                            
                                Are MongoDB queries client-side operations?
                            
                                Is it possible to create temporary collection in mongoDB like temp table in sql server?
                            
                                Syntax Error: missing ) after argument list @(shell):2:4
                            
                                Converting R list to JSON
                            
                                pymongo.errors.BulkWriteError: batch op errors occurred (MongoDB 3.4.2, pymongo 3.4.0, python 2.7.13)
                            
                                Unable to convert Kafka topic data into structured JSON with Confluent Elasticsearch sink connector
                            
                                How to handle Mongoose document version Error
                            
                                what does module exports = mongoose model do in NodeJs
                            
                                How to get fullDocument from MongoDB changeStream when a document is deleted?
                            
                                Differences between com.mongodb.client.MongoClient and com.mongodb.MongoClient
                            
                                Problem using easy_install on Windows 7, 64 bit. (cannot find python.exe)
                            
                                NoSQL database with high read performances (write accesses are not significant)?
                            
                                Security concerns while using MongoDB PHP driver
                            
                                How does MongoDB journaling work

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How do I count multiple keys in the same MongoDB aggregation $group query?

Tags:

mongodb

aggregation-framework

friendly_programmer

People also ask

1 Answers

3rf

Recent Activity

Donate For Us