Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Should I worry about cleaning up large objects in Node.js or leave it for the garbage collector?

Recently I ran into an issue with a node.js API where my memory was growing larger and larger with every request. I'm hosting my server on Heroku using their free version which is only 512MB of RAM. After getting a lot of traffic over the weekend, I started getting exceeded memory errors from Heroku so I began searching for a memory leak in my code to no avail. I wasn't keeping around any objects, everything should have been getting cleaned up and quite frankly, I was lost.

However, after doing some research, I found node.js runs the garbage collector when the max-old-space-size variable is reached and is defaulted to 1024MB on 64-bit systems. I've set this to 410 (80% of my available memmory) but wonder if I should just handle this in the code? Obviously it would be ideal to upgrade my instance and just have the normal default cap but that's not an option right now.

Example:

// lets assume there is some apiGet function
// that calls back with a very very large object with
// the following structure:
// {
//      status: "success",
//      statusCode: 200,
//      messages: [],
//      data: { users: [ huge array of users ] }
// }
// we have a manipulateData function that's going
// to do some stuff to the returned data and the
// call some callback to give the data to the caller
function manipulateData(callback) {
    apiGet(function(error, veryLargeObject) {
        var users = veryLargeObject.data.users;
        var usefulUsers = users.map(function(user) {
            // do some data calculations here and then
            // return just those properties we needed
        });

        callback(null, usefulUsers)
    });
}

So in this example, once manipulateData is done running, if I understand correctly, the "veyLargeObject" is now going to be set for garbage collection since there aren't any more pointers that have access to it (the returned usefulUsers is a new array created by the map). But this doesn't necessarily mean that all the memory that it was taking up is free, correct? Would it be wise to set veryLargeObject = null or undefined before calling the callback?

I hope what I'm asking makes sense. Basically: Is it a good idea to set large objects to null or undefined when there's no intent to use them anymore or should they just be left around for the garbage collector to clean up? Does the answer to this question change when you're only given 512MB of RAM vs having 8GB of RAM?

like image 579
jmadewell Avatar asked Oct 19 '22 16:10

jmadewell


2 Answers

If you're certain that a given object is no longer needed, then setting it to null is the way to go (be aware that this does not imply that any linked objects will also be garbage-collected to). Only when all references to that given object are set to null (the object becomes inaccessible from anywhere on your code), then it will be collected.

Since node.js uses the V8 engine under the hood, you can get some hints on how to improve garbage collection A tour of V8: Garbage Collection. If that's not enough, you can force GC by following these instructions.

like image 199
imelgrat Avatar answered Oct 27 '22 11:10

imelgrat


Setting a variable to null is only necessary in certain kind of closure situations, e.g. :

function createClosure(bigData) {
    var unrelatedVar = 1;

    doSomethingAsync(function theCallback(err, result) {
        if (bigData.matches(result)) {
            ...
        }
    });

    return function theReturnedFunction() {
        return unrelatedVar++;
    };
}

In V8 same-level closures share the same context object where closed over variables are located. All the same-level closures then point to the context object, so it will stay alive until all the functions are dead. So here theReturnedFunction and theCallback are both same-level functions that both point to the same context object with 2 members: bigData and unrelatedVar. So as long as the returned function is alive, the bigData is alive as well even though it cannot be referenced.

This is really easy to fall into because closed over vars look exactly like local variables, while in fact they act like an object's fields (which would use explicit this.field so it's always obvious). This is not any different from having to set an explicit object's .bigData field to null after it's unused, but when it's an explicit object it's much harder to miss.

like image 27
Esailija Avatar answered Oct 27 '22 09:10

Esailija