Robomongo : Exceeded memory limit for $group

Tags:

I`m using a script to remove duplicates on mongo, it worked in a collection with 10 items that I used as a test but when I used for the real collection with 6 million documents, I get an error.

This is the script which I ran in Robomongo (now known as Robo 3T):

var bulk = db.getCollection('RAW_COLLECTION').initializeOrderedBulkOp(); var count = 0;  db.getCollection('RAW_COLLECTION').aggregate([   // Group on unique value storing _id values to array and count    { "$group": {     "_id": { RegisterNumber: "$RegisterNumber", Region: "$Region" },     "ids": { "$push": "$_id" },     "count": { "$sum": 1 }         }},   // Only return things that matched more than once. i.e a duplicate   { "$match": { "count": { "$gt": 1 } } } ]).forEach(function(doc) {   var keep = doc.ids.shift();     // takes the first _id from the array    bulk.find({ "_id": { "$in": doc.ids }}).remove(); // remove all remaining _id matches   count++;    if ( count % 500 == 0 ) {  // only actually write per 500 operations       bulk.execute();       bulk = db.getCollection('RAW_COLLECTION').initializeOrderedBulkOp();  // re-init after execute   } });  // Clear any queued operations if ( count % 500 != 0 )     bulk.execute();

This is the error message:

Error: command failed: {     "errmsg" : "exception: Exceeded memory limit for $group, but didn't allow external sort. Pass allowDiskUse:true to opt in.",     "code" : 16945,     "ok" : 0 } : aggregate failed : _getErrorWithCode@src/mongo/shell/utils.js:23:13 doassert@src/mongo/shell/assert.js:13:14 assert.commandWorked@src/mongo/shell/assert.js:266:5 DBCollection.prototype.aggregate@src/mongo/shell/collection.js:1215:5 @(shell):1:1

So I need to set allowDiskUse:true to work? Where do I do that in the script and is there any problem doing this?

719

asked May 24 '17 14:05

Carlos Siestrup

2 Answers

{ allowDiskUse: true }

Should be placed right after the aggregation pipeline.

In you code this should go like this:

db.getCollection('RAW_COLLECTION').aggregate([   // Group on unique value storing _id values to array and count    { "$group": {     "_id": { RegisterNumber: "$RegisterNumber", Region: "$Region" },     "ids": { "$push": "$_id" },     "count": { "$sum": 1 }         }},   // Only return things that matched more than once. i.e a duplicate   { "$match": { "count": { "$gt": 1 } } } ], { allowDiskUse: true } )

Note: Using { allowDiskUse: true } may introduce issues related to performance as aggregation pipeline will access data from temporary files on disk. Also depends on disk performance and the size of your working set. Test performance for your use case

184

answered Sep 20 '22 14:09

Atish

It is always better to use match before group when you have large data. If you are using match before group, you won't get into this problem.

db.getCollection('sample').aggregate([    {$match:{State:'TAMIL NADU'}},    {$group:{        _id:{DiseCode:"$code", State:"$State"},        totalCount:{$sum:1}    }},     {      $project:{         Code:"$_id.code",         totalCount:"$totalCount",         _id:0       }        }  ])

If you really overcome this issue without match, then solution is { allowDiskUse: true }

answered Sep 16 '22 14:09

Thavaprakash Swaminathan

Related questions
                            
                                How to speed up MongoDB Inserts/sec?
                            
                                how to post arbitrary json object to webapi
                            
                                Difference between Find and FindAsync
                            
                                Mongo DB 4.0 Transactions With Mongoose & NodeJs, Express
                            
                                Copy folder with wildcard from docker container to host
                            
                                What characters are NOT allowed in MongoDB field names?
                            
                                Mongoid / Mongodb and querying embedded documents
                            
                                Use cases for updateOne over findOneAndUpdate in MongoDB [duplicate]
                            
                                Is there any way to recover recently deleted documents in MongoDB?
                            
                                Storing some small (under 1MB) files with MongoDB in NodeJS WITHOUT GridFS
                            
                                Spring MongoRepository is updating or upserting instead of inserting
                            
                                mongodb query by sub-field
                            
                                Mongoose nested query on Model by field of its referenced model
                            
                                MongoDB: How to get distinct list of sub-document field values?
                            
                                Mongoose - RangeError: Maximum Call Stack Size Exceeded
                            
                                How to reference another schema in my Mongoose schema?
                            
                                Run MongoDB server as a service (detached from terminal)?
                            
                                Can one make a relational database using MongoDB?
                            
                                MongoDB: Bulk insert (Bulk.insert) vs insert multiple (insert([...]))
                            
                                Unique index in mongoDB 3.2 ignoring null values

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Robomongo : Exceeded memory limit for $group

Tags:

duplicates

mongodb

out-of-memory

Carlos Siestrup

People also ask

2 Answers

Atish

Thavaprakash Swaminathan

Recent Activity

Donate For Us