I have a collection of documents like this one:
{
"_id" : ObjectId("..."),
"field1": "some string",
"field2": "another string",
"field3": 123
}
I'd like to be able to iterate over the entire collection, and find the entire number of fields there are. In this example document there are 3 (I don't want to include _id), but it ranges from 2 to 50 fields in a document. Ultimately, I'm just looking for the average number of fields per document.
Any ideas?
find({}). count() more fast then collection.
MongoDB aggregate $count element in array For this, MongoDB provides the $size aggregation to count and returns the total number of items in an array. Let's get understand this with the help of an example. Example: The subsequent documents were inserted into the Test collection.
First stage $project is to turn all keys into array to count fields. Second stage $group is to sum the number of keys/fields in the collection, also the number of documents processed. Third stage $project is subtracting the total number of fields with the total number of documents (As you don't want to count for _id ).
Iterate over the entire collection, and find the entire number of fields there are
Now you can utilise aggregation operator $objectToArray (SERVER-23310) to turn keys into values and count them. This operator is available in MongoDB v3.4.4+
For example:
db.collection.aggregate([
{"$project":{"numFields":{"$size":{"$objectToArray":"$$ROOT"}}}},
{"$group":{"_id":null, "fields":{"$sum":"$numFields"}, "docs":{"$sum":1}}},
{"$project":{"total":{"$subtract":["$fields", "$docs"]}, _id:0}}
])
First stage $project
is to turn all keys into array to count fields. Second stage $group
is to sum the number of keys/fields in the collection, also the number of documents processed. Third stage $project
is subtracting the total number of fields with the total number of documents (As you don't want to count for _id
).
You can easily add $avg to count for average on the last stage.
PRIMARY> var count = 0;
PRIMARY> db.my_table.find().forEach( function(d) { for(f in d) { count++; } });
PRIMARY> count
1074942
This is the most simple way I could figure out how to do this. On really large datasets, it probably makes sense to go the Map-Reduce path. But, while your set is small enough, this'll do.
This is O(n^2)
, but I'm not sure there is a better way.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With