Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Mongodb Aggregation count array/set size

Here's my problem:

Model:

{ application: "abc", date: Time.now, status: "1" user_id: [ id1, id2, id4] }

{ application: "abc", date: Time.yesterday, status: "1", user_id: [ id1, id3, id5] }

{ application: "abc", date: Time.yesterday-1, status: "1", user_id: [ id1, id3, id5] }

I need to count the unique number of user_ids in a period of time.

Expected result:

{ application: "abc", status: "1", unique_id_count: 5 }

I'm currently using the aggregation framework and counting the ids outside mongodb.

{ $match: { application: "abc" } }, { $unwind: "$users" }, { $group: { _id: { status: "$status"}, users: { $addToSet: "$users" } } }

My arrays of users ids are very large, so I have to iterate the dates or I'll get the maximum document limit (16mb).

I could also $group by

{ year: { $year: "$date" }, month: { $month: "$date" }, day: { $dayOfMonth: "$date" }

but I also get the document size limitation.

Is it possible to count the set size in mongodb?

thanks

like image 208
user2019059 Avatar asked Jan 28 '13 18:01

user2019059


People also ask

How do I count the number of items in an array in MongoDB?

count() or db. mycollection.

Can we use count with aggregate function in MongoDB?

MongoDB $count AggregationThe MongoDB $count operator allows us to pass a document to the next phase of the aggregation pipeline that contains a count of the documents. There a couple of important things to note about this syntax: First, we invoke the $count operator and then specify the string.

How do I filter an array in MongoDB aggregation?

Filter MongoDB Array Element Using $Filter Operator This operator uses three variables: input – This represents the array that we want to extract. cond – This represents the set of conditions that must be met. as – This optional field contains a name for the variable that represent each element of the input array.

Is aggregate faster than find?

Aggregation wins where the volume of data returned is much less than the original data or where you don't have the skill to build fast client side aggregations. I hope it answers your query.


2 Answers

The following will return number of uniqueUsers per application. This will apply an group operation to a result of a group operation by using pipeline feature of mongodb.

{ $match: { application: "abc" } }, 
{ $unwind: "$users" }, 
{ $group: { _id: "$status", users: { $addToSet: "$users" } } }, 
{ $unwind:"$users" }, 
{ $group : {_id : "$_id", count : {$sum : 1} } }

Hopefully this will be done in an easier way in the following releases of mongo by a command which gives the size of an array under a projection. {$project: {id: "$_id", count: {$size: "$uniqueUsers"}}} https://jira.mongodb.org/browse/SERVER-4899

Cheers

like image 193
cubbuk Avatar answered Oct 07 '22 18:10

cubbuk


Sorry I'm a little late to the party. Simply grouping on the 'user_id' and counting the result with a trivial group works just fine and doesn't run into doc size limits.

[
    {$match: {application: 'abc', date: {$gte: startDate, $lte: endDate}}},
    {$unwind: '$user_id'},
    {$group: {_id: '$user_id'}},
    {$group: {_id: 'singleton', count: {$sum: 1}}}
];
like image 32
mjhm Avatar answered Oct 07 '22 18:10

mjhm