I'm trying to count distinct values of multiple fields By one MongoDB Aggregation query.
So here's my data:
{
"car_type": "suv",
"color": "red",
"num_doors": 4
},
{
"car_type": "hatchback",
"color": "blue",
"num_doors": 4
},
{
"car_type": "wagon",
"color": "red",
"num_doors": 4
}
I want a distinct count of each field:
distinct_count_car_type=3
distinct_count_color=2
distinct_count_num_doors=1
I was able to group multiple fields and then do a distinct count but it could only give me a count on the first field. Not all of them. And also it's a large set of data.
MongoDB $count AggregationThe MongoDB $count operator allows us to pass a document to the next phase of the aggregation pipeline that contains a count of the documents. There a couple of important things to note about this syntax: First, we invoke the $count operator and then specify the string.
To count the unique values, use "distinct()" rather than "find()", and "length" rather than "count()". The first argument for "distinct" is the field for which to aggregate distinct values, the second is the conditional statement that specifies which rows to select.
You can use $and with aggregation but you don't have to write it, and is implicit using different filters, in fact you can pipe those filters in case one of them needs a different solution.
On large collections of millions of documents, MongoDB's aggregation was shown to be much worse than Elasticsearch. Performance worsens with collection size when MongoDB starts using the disk due to limited system RAM. The $lookup stage used without indexes can be very slow.
Running the following aggregate pipeline should give you the desired result:
db.collection.aggregate([
{
"$group": {
"_id": null,
"distinct_car_types": { "$addToSet": "$car_type" },
"distinct_colors": { "$addToSet": "$color" },
"distinct_num_doors": { "$addToSet": "$num_doors" }
}
},
{
"$project": {
"distinct_count_car_type": { "$size": "$distinct_car_types" },
"distinct_count_color": { "$size": "$distinct_colors" },
"distinct_count_num_doors": { "$size": "$distinct_num_doors" }
}
}
])
You're looking for the power of ... $objectToArray
!
db.foo.aggregate([
{$project: {x: {$objectToArray: "$$CURRENT"}}}
,{$unwind: "$x"}
,{$match: {"x.k": {$ne: "_id"}}}
,{$group: {_id: "$x.k", y: {$addToSet: "$x.v"}}}
,{$addFields: {size: {"$size":"$y"}} }
]);
This will yield:
{ "_id" : "num_doors", "y" : [ 4 ], "size" : 1 }
{ "_id" : "color", "y" : [ "blue", "red" ], "size" : 2 }
{
"_id" : "car_type",
"y" : [
"wagon",
"hatchback",
"suv"
],
"size" : 3
}
You can $project
or $addFields
as you see fit to include or exclude the set of unique values or the size.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With