I must do some data processing for one of my company's clients. They have a database of about 4.7GB of data. I need to add a field to each of these documents calculated using two properties of the mongo documents and an external reference.
My problem is, I can not do collection.find() because Node.js runs out of memory. What is the best way to iterate through an entire collection that is too large to load with a single call to find?
yes, there is a way. Mongo is designed to handle large datasets.
You are probably running out of memory, not because of db.collection.find(), but because you are trying to dump it all at once with something like db.collection.find().toArray().
The correct way to operate over resultsets that are bigger than memory is to use cursors. Here's how you'd do it in mongo console:
var outsidevars = {
"z": 5
};
var manipulator = function(document,outsidevars) {
var newfield = document.x + document.y + outsidevars.z;
document.newField = newfield;
return document;
};
var cursor = db.collection.find();
while (cursor.hasNext()) {
// load only one document from the resultset into memory
var thisdoc = cursor.next();
var newnoc = manipulator(thisdoc,outsidevars);
d.collection.update({"_id": thisdoc['_id']},newdoc);
};
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With