Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I count multiple keys in the same MongoDB aggregation $group query?

I have a query:

db.test.aggregate( {$group : { _id : '$key', frequency: { $sum : 1 } } } )

This will get the frequency of every enumeration of key in the test set. Basically, I have gotten the distribution of key.

Now imagine I want to get the distributions of key1, key2, and key3 (so three different distributions).

Obviously, I could run this query 3 times with each separate key, but it seems like we would be able to optimize the query by allowing it to count all 3 keys at the same time. I have been playing around with it and searching the whole of the inter-webs, but so far, I am consigned to running three separate aggregation queries or using a map/reduce function.

Does anyone have any other ideas?

like image 802
friendly_programmer Avatar asked May 16 '13 21:05

friendly_programmer


People also ask

Can we use count with aggregate function in MongoDB?

The MongoDB $count operator allows us to pass a document to the next phase of the aggregation pipeline that contains a count of the documents. There a couple of important things to note about this syntax: First, we invoke the $count operator and then specify the string.

Can we specify more than one aggregate function simultaneously in MongoDB?

collection. aggregate () can use several channels at the same time for data processing.

How do I count fields in MongoDB?

First stage $project is to turn all keys into array to count fields. Second stage $group is to sum the number of keys/fields in the collection, also the number of documents processed. Third stage $project is subtracting the total number of fields with the total number of documents (As you don't want to count for _id ).

What is _ID in $Group in MongoDB?

The _id expression specifies the group key. If you specify an _id value of null, or any other constant value, the $group stage returns a single document that aggregates values across all of the input documents. See the Group by Null example.


1 Answers

There are a few different approaches you could use here:

  1. Use map/reduce: don't do this. Right now it would be much faster to run the aggregation framework 3 times than to use a map reduce function for this use case.

  2. Run aggregation 3 times. This is not optimal, but if you don't have time constraints then this is the easiest option. If your aggregations are taking < a few seconds anyway then I wouldn't worry about optimizing until they become a problem.

  3. Here's the best work-around I can think of. The $group operator allows you to build an _id on multiple fields. E.g. {"_id":{"a":"$key1", "b":"$key2", "c":"$key3"}}. Doing this creates a grouping for all existing combinations of your different keys. You could potentially group you keys this way and then manually sum across the results in the client.

Let me elaborate. Let's say we have a collection of shapes. These shapes can have a color, a size, and a kind (square, circle, etc). An aggregation on a multi-key Id could look like:

db.shapes.aggregate({$group:{_id:{"f1":"$f1", "f2":"$f2", "f3":"$f3"}, count:{"$sum":1}}})

and return:

"result" : [
        {
            "_id" : {
                "f1" : "yellow",
                "f2" : "medium",
                "f3" : "triangle"
            },
            "count" : 4086
        },
        {
            "_id" : {
                "f1" : "red",
                "f2" : "small",
                "f3" : "triangle"
            },
            "count" : 4138
        },
        {
            "_id" : {
                "f1" : "red",
                "f2" : "big",
                "f3" : "square"
            },
            "count" : 4113
        },
        {
            "_id" : {
                "f1" : "yellow",
                "f2" : "small",
                "f3" : "triangle"
            },
            "count" : 4145
        },
        {
            "_id" : {
                "f1" : "red",
                "f2" : "small",
                "f3" : "square"
            },
            "count" : 4062
        }

... and so on

You would then sum up the results client-side, over a drastically reduced number of entries. Assuming the number of unique values for each key is sufficiently small compared to the total number of documents, you could do this final step in a negligible amount of time.

like image 107
3rf Avatar answered Oct 19 '22 01:10

3rf