Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I decode the meaning of memory data in node.js & debug the memory leak?

I've got an RSS to MongoDB reader/scraper that runs through a data set larger than my system has memory for. As I loop through the data, the system slows down. I'm reasonably sure that it's cause I'm running out of memory.

I've added some debug info and have made a few changes, but I don't know how to read the information given in the debug output.

Here's a debug output sample (from before it gets deadly):

 100 items 
 Memory: { rss: 11104256,        // what is RSS?
           vsize: 57507840,      // what is VSIZE?
           heapTotal: 4732352,   // heapTotal?
           heapUsed: 3407624 }   // heapUsed?
 200 items
 Memory: { rss: 12533760,
           vsize: 57880576,
           heapTotal: 6136320,
           heapUsed: 3541984 }
                                 // what key numbers do I watch for?
                                 // when do I reach 'situation critical'? 
                                 // how do I free up memory to prevent problems?

Also, if it helps and for better illustration, I've included a sample of the code. One change I've made already is moving all of the require statements outside of the GrabRss function.

var http    = require('http');
var sys     = require('sys');
var xml2js  = require('xml2js');
var util    = require('util');
var Db      = require('../lib/mongodb').Db,
    Conn    = require('../lib/mongodb').Connection,
    Server  = require('../lib/mongodb').Server,
    // BSON = require('../lib/mongodb').BSONPure;
    BSON    = require('../lib/mongodb').BSONNative;

GrabRss = function(grab, start) {           
    var options = {
        host: 'www.example.com',
        port: 80,
        path: '/rss/'+grab+'/'+start
    };

    var data;
    var items;
    var checked = 0;
    var len = 0;

    GotResponse = function(res) {
        var ResponseBody = "";
        res.on('data', DoChunk);
        res.on('end', EndResponse);

        function DoChunk(chunk){
            ResponseBody += chunk;
        }
        function EndResponse() {
            //console.log(ResponseBody);
            var parser = new xml2js.Parser();
            parser.addListener('end', GotRSSObject);
            parser.parseString(ResponseBody);
        }
    }

    GotError = function(e) {
        console.log("Got error: " + e.message);
    }

    GotRSSObject = function(r){
        items = r.item;
        //console.log(sys.inspect(r));

        var db = new Db('rss', new Server('localhost', 27017, {}), {native_parser:false});
        db.open(function(err, db){
             db.collection('items', function(err, col) {
                len = items.length;
                if (len === 0) {
                    process.exit(0);
                }
                for (i in items) {
                    SaveItem(item[i], col);
                }
             });
        });
    }

    SaveMovie = function(i, c) {
        c.update({'id': i.id}, {$set: i}, {upsert: true, safe: true}, function(err){
            if (err) console.warn(err.message);
            if (++checked >= len) {
                if (checked < 5000) {
                        delete data;   // added since asking
                        delete items; // added since asking

                    console.log(start+checked);
                    console.log('Memory: '+util.inspect(process.memoryUsage()));
                    GrabRss(50, start+checked);
                } else {
                    console.log(len);
                    process.exit(0);
                }
            } else if (checked % 10 == 0) {
                console.log(start+checked);
            }
        });
    }
    http.get(options, GotResponse).on('error', GotError);

}
GrabRss(50, 0);
like image 754
Alex C Avatar asked May 29 '11 20:05

Alex C


1 Answers

After reading through this code, I do see that items in GotRSSObject is declared as a global, because there is no var prefacing it.

Aside from that, I see no other obvious memory leaks. A good basic technique is to add some more print statements to see where the memory is being allocated and then to check where you would expect that memory to be cleaned up by asserting that the variables == null.

The problem with memory with node.js and v8 is that it's not guaranteed to be garbage collected at any time and afaik, you can't force garbage collection to happen. You'll want to limit the amount of data you're working with to easily fit within memory and provide some error handling (perhaps with setTimeout or process.nextTick) to wait until memory has been cleaned up.

A word of advice with nextTick - it's a very, very fast call. Node.js is single threaded on an event loop as everyone knows. Using nextTick will literally execute that function on the very next loop - make sure you don't call to it very often otherwise you'll find yourself wasting cycles.

And regarding rss, vsize, heapTotal, heapUsed... vsize is the entire size of memory that your process is using and rss is how much of that is in actual physical RAM and not swap. heaptotal and heapUsed refer to v8's underlying storage that you have no control of. You'll mostly be concerned with vsize, but you can also get more detailed information with top or Activity Monitor on OS X (anyone know of good process visualization tools on *nix systems?).

like image 161
tjarratt Avatar answered Nov 14 '22 22:11

tjarratt