Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Distinct count of multiple fields using mongodb aggregation

I'm trying to count distinct values of multiple fields By one MongoDB Aggregation query.

So here's my data:

{
    "car_type": "suv",
    "color": "red",
    "num_doors": 4
},
{
    "car_type": "hatchback",
    "color": "blue",
    "num_doors": 4
},
{
    "car_type": "wagon",
    "color": "red",
    "num_doors": 4
}

I want a distinct count of each field:

distinct_count_car_type=3
distinct_count_color=2
distinct_count_num_doors=1

I was able to group multiple fields and then do a distinct count but it could only give me a count on the first field. Not all of them. And also it's a large set of data.

like image 220
Deckard Avatar asked Sep 28 '17 14:09

Deckard


People also ask

Can we use count with aggregate function in MongoDB?

MongoDB $count AggregationThe MongoDB $count operator allows us to pass a document to the next phase of the aggregation pipeline that contains a count of the documents. There a couple of important things to note about this syntax: First, we invoke the $count operator and then specify the string.

How count distinct values in MongoDB?

To count the unique values, use "distinct()" rather than "find()", and "length" rather than "count()". The first argument for "distinct" is the field for which to aggregate distinct values, the second is the conditional statement that specifies which rows to select.

Can we use $and in aggregate MongoDB?

You can use $and with aggregation but you don't have to write it, and is implicit using different filters, in fact you can pipe those filters in case one of them needs a different solution.

Is aggregation fast in MongoDB?

On large collections of millions of documents, MongoDB's aggregation was shown to be much worse than Elasticsearch. Performance worsens with collection size when MongoDB starts using the disk due to limited system RAM. The $lookup stage used without indexes can be very slow.


2 Answers

Running the following aggregate pipeline should give you the desired result:

db.collection.aggregate([
    {
        "$group": {
            "_id": null,
            "distinct_car_types": { "$addToSet": "$car_type" },
            "distinct_colors": { "$addToSet": "$color" },
            "distinct_num_doors": { "$addToSet": "$num_doors" }
        }
    },
    {
        "$project": {
            "distinct_count_car_type": { "$size": "$distinct_car_types" },
            "distinct_count_color": { "$size": "$distinct_colors" },
            "distinct_count_num_doors": { "$size": "$distinct_num_doors" }
        }
    }
])
like image 119
chridam Avatar answered Oct 20 '22 19:10

chridam


You're looking for the power of ... $objectToArray!

db.foo.aggregate([
  {$project: {x: {$objectToArray: "$$CURRENT"}}}
  ,{$unwind: "$x"}
  ,{$match: {"x.k": {$ne: "_id"}}}
  ,{$group: {_id: "$x.k", y: {$addToSet: "$x.v"}}}
  ,{$addFields: {size: {"$size":"$y"}} }
                    ]);

This will yield:

{ "_id" : "num_doors", "y" : [ 4 ], "size" : 1 }
{ "_id" : "color", "y" : [ "blue", "red" ], "size" : 2 }
{
    "_id" : "car_type",
    "y" : [
        "wagon",
        "hatchback",
        "suv"
    ],
    "size" : 3
}

You can $projector $addFieldsas you see fit to include or exclude the set of unique values or the size.

like image 37
Buzz Moschetti Avatar answered Oct 20 '22 19:10

Buzz Moschetti