Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to aggregate on huge array in mongoDB?

Tags:

mongodb

I have a mongodb of about 400gb. The documents contain a variety of fields, but the key here is an array of IDs.

So a json file might look like this

{
 "name":"bob"
 "dob":"1/1/2011"
 "key":
      [  
       "1020123123",
       "1234123222",
       "5021297723"
      ]
}

The focal variable here is "key". There is about 10 billion total keys across 50 million documents (so each document has about 200 keys). Keys can repeat, and there are about 15 million UNIQUE keys.

What I would like to do is return the 10,000 most common keys. I thought aggregate might do this, but I'm having a lot of trouble getting it to run. Here is my code:

db.users.aggregate( 
 [ 
  { $unwind : "$key" }, 
  { $group : { _id : "$key", number : { $sum : 1 } } },
  { $sort : { number : -1 } }, 
  { $limit : 10000 }
 ] 
);

Any ideas what I'm doing wrong?

like image 883
AlexKogan Avatar asked Sep 26 '14 23:09

AlexKogan


People also ask

Is aggregation fast in MongoDB?

On large collections of millions of documents, MongoDB's aggregation was shown to be much worse than Elasticsearch. Performance worsens with collection size when MongoDB starts using the disk due to limited system RAM. The $lookup stage used without indexes can be very slow.

Is aggregation slow in MongoDB?

Aggregation is slow - Working with Data - MongoDB Developer Community Forums.

Which aggregation method is preferred for use by MongoDB?

The pipeline provides efficient data aggregation using native operations within MongoDB, and is the preferred method for data aggregation in MongoDB. The aggregation pipeline can operate on a sharded collection.

How do I append to an array in MongoDB?

If the value is an array, $push appends the whole array as a single element. To add each element of the value separately, use the $each modifier with $push .


1 Answers

Try this:

db.users.aggregate( 
 [ 
  { $unwind : "$key" }, 
  { $group : { _id : "$key", number : { $sum : 1 } } },
  { $sort : { number : -1 } }, 
  { $limit : 10000 },
  { $out:"result"},
 ], {
  allowDiskUse:true,
  cursor:{}
 }
);

Then find result by db.result.find().

like image 183
Wizard Avatar answered Nov 05 '22 17:11

Wizard