How to aggregate on huge array in mongoDB?

Tags:

mongodb

I have a mongodb of about 400gb. The documents contain a variety of fields, but the key here is an array of IDs.

So a json file might look like this

{
 "name":"bob"
 "dob":"1/1/2011"
 "key":
      [  
       "1020123123",
       "1234123222",
       "5021297723"
      ]
}

The focal variable here is "key". There is about 10 billion total keys across 50 million documents (so each document has about 200 keys). Keys can repeat, and there are about 15 million UNIQUE keys.

What I would like to do is return the 10,000 most common keys. I thought aggregate might do this, but I'm having a lot of trouble getting it to run. Here is my code:

db.users.aggregate( 
 [ 
  { $unwind : "$key" }, 
  { $group : { _id : "$key", number : { $sum : 1 } } },
  { $sort : { number : -1 } }, 
  { $limit : 10000 }
 ] 
);

Any ideas what I'm doing wrong?

883

asked Sep 26 '14 23:09

AlexKogan

1 Answers

Try this:

db.users.aggregate( 
 [ 
  { $unwind : "$key" }, 
  { $group : { _id : "$key", number : { $sum : 1 } } },
  { $sort : { number : -1 } }, 
  { $limit : 10000 },
  { $out:"result"},
 ], {
  allowDiskUse:true,
  cursor:{}
 }
);

Then find result by db.result.find().

183

answered Nov 05 '22 17:11

Wizard

Related questions
                            
                                Find after populate mongoose
                            
                                How to find by id in golang and mongodb
                            
                                Mongodb sort by date
                            
                                How to connect remote mongodb with pymongo
                            
                                Mongo: dates in match aggregate query seem to be ignored
                            
                                Unable to print BSON object from javascript
                            
                                How can you specify the mongodb username and password using a Server instance?
                            
                                MongoDB - Projecting a field that doesn't always exist
                            
                                How to resolve ClassNotFoundException: com.mongodb.connection.BufferProvider?
                            
                                Aggregate $group for multiple date ranges
                            
                                Mongoid on RoR3: 1) how to return specific field on query? 2) what inverse_of is needed for?
                            
                                Is there a MongoDB GUI desktop application for Linux? [closed]
                            
                                MongoDB: automatically generated IDs are zeroes
                            
                                Is there a way to store python objects directly in mongoDB without serializing them
                            
                                How to unit test a method which connects to mongo, without actually connecting to mongo?
                            
                                MongoDB 3 Java check if collection exists
                            
                                Is there any way to supply the allowDiskUse option to an mongoose.js aggregation?
                            
                                Does Mongoose Actually Validate the Existence of An Object Id?
                            
                                Mongodb cluster with aws cloud formation and auto scaling
                            
                                Python - Pymongo Insert and Update Documents

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With