I've got an RSS to MongoDB reader/scraper that runs through a data set larger than my system has memory for. As I loop through the data, the system slows down. I'm reasonably sure that it's cause I'm running out of memory.
I've added some debug info and have made a few changes, but I don't know how to read the information given in the debug output.
Here's a debug output sample (from before it gets deadly):
100 items
Memory: { rss: 11104256, // what is RSS?
vsize: 57507840, // what is VSIZE?
heapTotal: 4732352, // heapTotal?
heapUsed: 3407624 } // heapUsed?
200 items
Memory: { rss: 12533760,
vsize: 57880576,
heapTotal: 6136320,
heapUsed: 3541984 }
// what key numbers do I watch for?
// when do I reach 'situation critical'?
// how do I free up memory to prevent problems?
Also, if it helps and for better illustration, I've included a sample of the code. One change I've made already is moving all of the require statements outside of the GrabRss function.
var http = require('http');
var sys = require('sys');
var xml2js = require('xml2js');
var util = require('util');
var Db = require('../lib/mongodb').Db,
Conn = require('../lib/mongodb').Connection,
Server = require('../lib/mongodb').Server,
// BSON = require('../lib/mongodb').BSONPure;
BSON = require('../lib/mongodb').BSONNative;
GrabRss = function(grab, start) {
var options = {
host: 'www.example.com',
port: 80,
path: '/rss/'+grab+'/'+start
};
var data;
var items;
var checked = 0;
var len = 0;
GotResponse = function(res) {
var ResponseBody = "";
res.on('data', DoChunk);
res.on('end', EndResponse);
function DoChunk(chunk){
ResponseBody += chunk;
}
function EndResponse() {
//console.log(ResponseBody);
var parser = new xml2js.Parser();
parser.addListener('end', GotRSSObject);
parser.parseString(ResponseBody);
}
}
GotError = function(e) {
console.log("Got error: " + e.message);
}
GotRSSObject = function(r){
items = r.item;
//console.log(sys.inspect(r));
var db = new Db('rss', new Server('localhost', 27017, {}), {native_parser:false});
db.open(function(err, db){
db.collection('items', function(err, col) {
len = items.length;
if (len === 0) {
process.exit(0);
}
for (i in items) {
SaveItem(item[i], col);
}
});
});
}
SaveMovie = function(i, c) {
c.update({'id': i.id}, {$set: i}, {upsert: true, safe: true}, function(err){
if (err) console.warn(err.message);
if (++checked >= len) {
if (checked < 5000) {
delete data; // added since asking
delete items; // added since asking
console.log(start+checked);
console.log('Memory: '+util.inspect(process.memoryUsage()));
GrabRss(50, start+checked);
} else {
console.log(len);
process.exit(0);
}
} else if (checked % 10 == 0) {
console.log(start+checked);
}
});
}
http.get(options, GotResponse).on('error', GotError);
}
GrabRss(50, 0);
After reading through this code, I do see that items
in GotRSSObject is declared as a global, because there is no var
prefacing it.
Aside from that, I see no other obvious memory leaks. A good basic technique is to add some more print statements to see where the memory is being allocated and then to check where you would expect that memory to be cleaned up by asserting that the variables == null.
The problem with memory with node.js and v8 is that it's not guaranteed to be garbage collected at any time and afaik, you can't force garbage collection to happen. You'll want to limit the amount of data you're working with to easily fit within memory and provide some error handling (perhaps with setTimeout or process.nextTick) to wait until memory has been cleaned up.
A word of advice with nextTick - it's a very, very fast call. Node.js is single threaded on an event loop as everyone knows. Using nextTick will literally execute that function on the very next loop - make sure you don't call to it very often otherwise you'll find yourself wasting cycles.
And regarding rss
, vsize
, heapTotal
, heapUsed
... vsize
is the entire size of memory that your process is using and rss
is how much of that is in actual physical RAM and not swap. heaptotal
and heapUsed
refer to v8's underlying storage that you have no control of. You'll mostly be concerned with vsize
, but you can also get more detailed information with top
or Activity Monitor on OS X (anyone know of good process visualization tools on *nix systems?).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With