Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Aggregate Count Array Members Matching Condition

I'm having some trouble, as stated in title, to count elements in an Array using MongoDB. I have a DB with only one document, made as follow:

 {_id: ObjectId("abcdefghilmnopq"),
    "Array": [
      {field1: "val1",
       field2: "val2",
       field3: "val3",
       ...
       },
       {field1: "Value1",
        field2: "Value2",
        field3: "Value3",
       ...
       },
        ...
     ]
 }
        

I wanna count the number of elements of the array which have a certain condition (e.g. field1: "a", and count all elements which have field1 = a). I'm trying with this code:

db.collection.aggregate([
{ $unwind : {path: "$Array", 
             includeArrayIndex: "arrayIndex"}},
{ $match : { "Array.field1" : "a"}},
{ $project : { _id : 0, 
               Array : 1, 
               arrayIndex: 1, 
               total: {$size: "$Array"}}}
])

but I receive this error:

Command failed with error 17124: 'The argument to $size must be an array, but was of type: object' on server

I looked for several answer to this problem, but I didn't find anything resolutive for my problem. I mean, 'Array' IS an array!

like image 791
Andrea Cristiani Avatar asked Jun 01 '18 10:06

Andrea Cristiani


People also ask

Is aggregate faster than find?

Aggregation wins where the volume of data returned is much less than the original data or where you don't have the skill to build fast client side aggregations. I hope it answers your query.

Can we use count with aggregate function in MongoDB?

MongoDB $count AggregationThe MongoDB $count operator allows us to pass a document to the next phase of the aggregation pipeline that contains a count of the documents. There a couple of important things to note about this syntax: First, we invoke the $count operator and then specify the string.

What is $size in MongoDB?

Definition. Counts and returns the total number of items in an array. The argument for $size can be any expression as long as it resolves to an array.


1 Answers

The error is because it's no longer an array after you $unwind and therefore no longer a valid argument to $size.

You appear to be attempting to "merge" a couple of existing answers without understanding what they are doing. What you really want here is $filter and $size

db.collection.aggregate([
  { "$project": {
    "total": {
      "$size": {
        "$filter": {
          "input": "$Array",
          "cond": { "$eq": [ "$$this.field1", "a" ] }
        }
      }
    }
  }}
])

Or "reinvent the wheel" using $reduce:

db.collection.aggregate([
  { "$project": {
    "total": {
      "$reduce": {
        "input": "$Array",
        "initialValue": 0,
        "in": {
          "$sum": [
            "$$value", 
            { "$cond": [{ "$eq": [ "$$this.field1", "a" ] }, 1, 0] }
        }
      }
    }
  }}
])

Or for what you were trying to do with $unwind, you actually $group again in order to "count" how many matches there were:

db.collection.aggregate([
  { "$unwind": "$Array" },
  { "$match": { "Array.field1": "a" } },
  { "$group": {
    "_id": "$_id",
    "total": { "$sum": 1 }
  }}
])

The first two forms are the "optimal" for modern MongoDB environments. The final form with $unwind and $group is a "legacy" construct which really has not been necessary for this type of operation since MongoDB 2.6, though with some slightly different operators.

In those first two we are basically comparing the field1 value of each array element whilst it's still an array. Both $filter and $reduce are modern operators designed to work with an existing array in place. The same comparison is done on each one using the aggregation $eq operator which returns a boolean value based on whether the arguments given are "equal" or not. In this case on each array member to the expected value of "a".

In the case of $filter, the array actually remains intact except for any elements which did not meet the supplied condition in "cond" are removed from the array. Since we still have an "array" as output we can then use the $size operator to measure the number of array elements left after that filter condition was processed.

The $reduce on the other hand works through the array elements and supplies an expression over each element and a stored "accumulator" value, which we initialized with "initialValue". In this case the same $eq test is applied within the $cond operator. This is a "ternary" or if/then/else conditional operator which allows a tested expression which returns a boolean value to return the then value when true or the else value when false.

In that expression we return 1 or 0 respectively and supply the overall result of adding that returned value and the current "accumulator" "$$value" with the $sum operator to add these together.

The final form used $unwind on the array. What this actually does is deconstructs the array members to create a "new document" for every array member and it's related parent fields in the original document. This effectively "copies" the main document for every array member.

Once you $unwind the structure of the documents is changed to a "flatter" form. This is why you can then do the subsequent $match pipeline stage to remove the un-matched documents.

This brings us to $group which is applied to "bring back together" all of the information related to a common key. In this case it's the _id field of the original document, which was of course copied into every document produced by the $unwind. As we go back to this "common key" as a single document, we can "count" the remaining "documents" extracted from the array using the $sum accumulator.

If we wanted the remaining "array" back, then you can $push and rebuild the array with only the remaining members:

  { "$group": {
    "_id": "$_id",
    "Array": { "$push": "$Array" },
    "total": { "$sum": 1 }
  }}

But of course instead of using $size in another pipeline stage, we can simply still "count" like we already did with the $sum

like image 187
Neil Lunn Avatar answered Oct 02 '22 20:10

Neil Lunn