Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Node doesn't gc my object properly

Here's a simple case:

let html = `<<some huge html file>>`
var libxmljs = require("libxmljs");

class MyObject{
  constructor(html){
    this.doc = libxmljs.parseHtml(html);
    this.node = this.doc.root()
  }
}

let obj

for(var i = 0; i < 100000; i++){
  obj = new MyObject(html)
  // if I uncomment the next line it works fine
  // obj.node = null
  console.log(i)
}

When I run it the script quickly runs out of memory, apparently because obj.node isn't getting garbage collected properly. How can I make sure that happens without explicitly setting it to null when I think I'm done with it?

like image 544
pguardiario Avatar asked May 29 '26 14:05

pguardiario


1 Answers

The object .root() returns seems to GC more if you don't store the reference specifically in a class instance. The memory usage still seems fairly leaky as the full amount of heap allocated is never reclaimed. Node itself seems to use about twice as much memory than lives on the heap to take care of the native libxml code. Maybe raise an issue on libxmljs as this quacks like a bug.

Not storing the object in the class instance but passing it through works better.

class MyObject{
  constructor(){
    this.doc = libxmljs.parseHtml(html)
  }
  get node(){
    return this.doc.root()
  }
}

Using a plain object works better too.

function myObject(){
  let doc = libxmljs.parseHtml(html)
  let node = doc.root()
  return {
    doc: doc,
    node: node,
  }
}

As an alternative maybe try one of the JS based parsers.

like image 174
Matt Avatar answered May 31 '26 06:05

Matt



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!