I'm trying to get a list of the number of records that have arrays of varying size. I want to get the distribution of array sizes for all records so I can build a histogram like this:
| *
| *
documents | * *
| * * *
|_*__*__*___*__*___
2 5 6 23 47
Array Size
So the raw documents look something like this:
{hubs : [{stuff:0, id:6}, {stuff:1"}, .... ]}
{hubs : [{stuff:0, id:6}]}`
So far using the aggregation framework and some of the help here I've come up with
db.sitedata.aggregate([{ $unwind:'$hubs'},
{ $group : {_id:'$_id', count:{$sum:1}}},
{ $group : {_id:'$count', count:{$sum:1}}},
{ $sort : {_id: 1}}])
This seems to give me the results I want, but it's not very fast. I'm wondering if there is something I can do like this that may not need two group calls. The syntax is wrong here, but what I'm trying to do is put the count value in the first _id field:
db.sitedata.aggregate([{ $unwind:'$hubs'},
{ $group : {_id:{$count:$hubs}, count:1}},
{ $sort : { _id: 1 }}])
Comparing MySQL 8.0 and MongoDB 4.0. 3, it was seen that MongoDB is typically faster on more complex queries. It's faster from disk when there are no indexes, whereas MySQL is faster from RAM. BI Connector is slower for simple queries and not as fast as hand-crafted aggregation.
Without seeing your data and your query it is difficult to answer why aggregate+sort is faster than find+sort. A well indexed(Indexing that suits your query) data will always yield faster results on your find query.
The pipeline provides efficient data aggregation using native operations within MongoDB, and is the preferred method for data aggregation in MongoDB. The aggregation pipeline can operate on a sharded collection. The aggregation pipeline can use indexes to improve its performance during some of its stages.
This is similar to the basic aggregation available in SQL with the GROUP BY clause and COUNT, SUM and AVG functions. MongoDB Aggregation goes further though and can also perform relational-like joins, reshape documents, create new and update existing collections, and so on.
Now that 2.6 is out, aggregation framework supports a new array operator $size
which will allow you to $project
the array size without having to unwind and re-group.
db.sitedata.aggregate([{ $project:{ 'count': { '$size':'$hubs'} } },
{ $group : {_id:'$count', count:{$sum:1} } },
{ $sort : { _id: 1 } } ] )
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With