Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Count how many documents contain a field

I have these three MongoDB documents:

{ 
    "_id" : ObjectId("571094afc2bcfe430ddd0815"), 
    "name" : "Barry", 
    "surname" : "Allen", 
    "address" : [
        {
            "street" : "Red", 
            "number" : NumberInt(66), 
            "city" : "Central City"
        }, 
        {
            "street" : "Yellow", 
            "number" : NumberInt(7), 
            "city" : "Gotham City"
        }
    ]
}

{ 
    "_id" : ObjectId("57109504c2bcfe430ddd0816"), 
    "name" : "Oliver", 
    "surname" : "Queen", 
    "address" : {
        "street" : "Green", 
        "number" : NumberInt(66), 
        "city" : "Star City"
    }
}
{ 
    "_id" : ObjectId("5710953ac2bcfe430ddd0817"), 
    "name" : "Tudof", 
    "surname" : "Unknown", 
    "address" : "homeless"
}

The address field is an Array of Objects in the first document, an Object in the second and a String in the third. My target is to find how many documents of my collection containinig the field address.street. In this case the right count is 1 but with my query I get two:

db.coll.find({"address.street":{"$exists":1}}).count()

I also tried map/reduce. It works but it is slower; so if it is possible, I would avoid it.

like image 402
DistribuzioneGaussiana Avatar asked Apr 15 '16 11:04

DistribuzioneGaussiana


People also ask

How do I Count a specific number in a text file?

Because a text value won't ever be found in a true number, COUNTIF will return zero. To count a specific number in numeric content, you can use a different formula based on the SEARCH function and the ISNUMBER function like this: In this formula, text is the text you are looking for, and range is the range of cells you want to count.

How to count the number of documents in a collection?

The count command counts the number of documents in a collection or a view. In this example, we can see that there are four dogs in the pets collection. We can also see that it returns a document that contains the count as well as the command status.

How do I Count the number of values in a field?

In the Value Field Settings dialog box, do the following: In the Summarize value field by section, select Count. In the Custom Name field, modify the name to Count.

What is the difference between countdocuments () and estimateddocumentcount () methods?

The countDocuments () method on the other hand, doesn’t rely on metadata, and returns an accurate count by performing an aggregation of the documents. The db.collection.estimatedDocumentCount () method is a wrapper for the count command that returns the count of all documents in a collection or view.


2 Answers

The distinction here is that the .count() operation is actually "correct" in returning the "document" count where the field is present. So the general considerations break down to:

If you just want to exlude the documents with the array field

Then the most effective way of excluding those documents where the "street" was a property of the "address" as an "array", then just use the dot-notation property of looking for the 0 index to not exist in the exlcusion:

db.coll.find({
  "address.street": { "$exists": true },
  "address.0": { "$exists": false }
}).count()

As a natively coded operator test in both cases $exists does the correct job and efficiently.

If you intended to count field occurences

If what you are actually asking is the "field count", where some "documents" contain array entries where that "field" may be present several times.

For that you need the aggregation framework or mapReduce like you mention. MapReduce uses JavaScript based processing and is therefore going to be considerably slower than the .count() operation. The aggregation framework also needs to calculate and "will" be slower than .count(), but not by as much as mapReduce.

In MongoDB 3.2 you get some help here by the expanded ability of $sum to work on an array of values as well as being an grouping accumulator. The other helper here is $isArray which allows a different processing method via $map when the data is in fact "an array":

db.coll.aggregate([
  { "$group": {
    "_id": null,
    "count": {
      "$sum": {
        "$sum": {
          "$cond": {
            "if": { "$isArray": "$address" },
            "then": {
              "$map": {
                "input": "$address",
                "as": "el",
                "in": {
                  "$cond": {
                    "if": { "$ifNull": [ "$$el.street", false ] },
                    "then": 1,
                    "else": 0
                  }
                }
              }
            },
            "else": {
              "$cond": {
                "if": { "$ifNull": [ "$address.street", false ] },
                "then": 1,
                "else": 0
              }
            }
          }
        }
      }
    }
  }}
])

Earlier versions hinge on a bit more conditional processing in order to treat the array and non-array data differently, and generally require $unwind to process array entries.

Either transposing the array via $map with MongoDB 2.6:

db.coll.aggregate([
  { "$project": {
    "address": {
      "$cond": {
        "if": { "$ifNull": [ "$address.0", false ] },
        "then": "$address",
        "else": {
          "$map": {
            "input": ["A"],
            "as": "el",
            "in": "$address"
          }
        }
      }
    }
  }},
  { "$unwind": "$address" },
  { "$group": {
    "_id": null,
    "count": {
      "$sum": {
        "$cond": {
          "if": { "$ifNull": [ "$address.street", false ] },
          "then": 1,
          "else": 0
        }
      }
    }
  }}
])

Or providing conditional selection with MongoDB 2.2 or 2.4:

db.coll.aggregate([
  { "$group": {
    "_id": "$_id",
    "address": { 
      "$first": {
        "$cond": [
          { "$ifNull": [ "$address.0", false ] },
          "$address",
          { "$const": [null] }
        ]
      }
    },
    "other": {
      "$push": {
        "$cond": [
          { "$ifNull": [ "$address.0", false ] },
          null,
          "$address"
        ]
      }
    },
    "has": { 
      "$first": {
        "$cond": [
          { "$ifNull": [ "$address.0", false ] },
          1,
          0
        ]
      }
    }
  }},
  { "$unwind": "$address" },
  { "$unwind": "$other" },
  { "$group": {
    "_id": null,
    "count": {
      "$sum": {
        "$cond": [
          { "$eq": [ "$has", 1 ] },
          { "$cond": [
            { "$ifNull": [ "$address.street", false ] },
            1,
            0
          ]},
          { "$cond": [
            { "$ifNull": [ "$other.street", false ] },
            1,
            0
          ]}
        ]
      }
    }
  }}
])

So the latter form "should" perform a bit better than mapReduce, but probably not by much.

In all cases the logic falls to using $ifNull as the "logical" form of $exists for the aggregation framework. Paired with $cond, a "truthfull" result is obtained when the property actually exsists, and a false value is returned when it is not. This determines whether 1 or 0 is returned respectively to the overall accumulation via $sum.

Ideally you have the modern version that can do this in a single $group pipeline stage, but otherwise you need the longer path.

like image 173
Blakes Seven Avatar answered Oct 07 '22 19:10

Blakes Seven


Can you try this:

db.getCollection('collection_name').find({
        "address.street":{"$exists":1},
        "$where": "Array.isArray(this.address) == false && typeof this.address === 'object'"
});

In where clause, we are excluding if address is array and Including address if it's type is object.

like image 27
Hiren S. Avatar answered Oct 07 '22 18:10

Hiren S.