Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

node.js process out of memory in http.request loop

Tags:

node.js

In my node.js server i cant figure out, why it runs out of memory. My node.js server makes a remote http request for each http request it receives, therefore i've tried to replicate the problem with the below sample script, that also runs out of memory.

This only happens if the iterations in the for loop are very high.

From my point of view, the problem is related to the fact that node.js is queueing the remote http requests. How to avoid this?

This is the sample script:

(function() {
  var http, i, mypost, post_data;
  http = require('http');
  post_data = 'signature=XXX%7CPSFA%7Cxxxxx_value%7CMyclass%7CMysubclass%7CMxxxxx&schedule=schedule_name_6569&company=XXXX';
  mypost = function(post_data, cb) {
    var post_options, req;
    post_options = {
      host: 'myhost.com',
      port: 8000,
      path: '/set_xxxx',
      method: 'POST',
      headers: {
        'Content-Length': post_data.length
      }
    };
    req = http.request(post_options, function(res) {
      var res_data;
      res.setEncoding('utf-8');
      res_data = '';
      res.on('data', function(chunk) {
        return res_data += chunk;
      });
      return res.on('end', function() {
        return cb();
      });
    });
    req.on('error', function(e) {
      return console.debug('TM problem with request: ' + e.message);
    });
    req.write(post_data);
    return req.end;
  };
  for (i = 1; i <= 1000000; i++) {
    mypost(post_data, function() {});
  }
}).call(this);


$ node -v
v0.4.9
$ node sample.js
FATAL ERROR: CALL_AND_RETRY_2 Allocation failed - process out of memory

Tks in advance

gulden PT

like image 917
gulden PT Avatar asked Jul 08 '11 11:07

gulden PT


1 Answers

Constraining the flow of requests into the server

It's possible to prevent overload of the built-in Server and its HTTP/HTTPS variants by setting the maxConnections property on the instance. Setting this property will cause node to stop accept()ing connections and force the operating system to drop requests when the listen() backlog is full and the application is already handling maxConnections requests.

Throttling outgoing requests

Sometimes, it's necessary to throttle outgoing requests, as in the example script from the question.

Using node directly or using a generic pool

As the question demonstrates, unchecked use of the node network subsystem directly can result in out of memory errors. Something like node-pool makes the active pool management attractive, but it doesn't solve the fundamental problem of unconstrained queuing. The reason for this is that node-pool doesn't provide any feedback about the state of the client pool.

UPDATE: As of v1.0.7 node-pool includes a patch inspired by this post to add a boolean return value to acquire(). The code in the following section is no longer necessary and the example with the streams pattern is working code with node-pool.

Cracking open the abstraction

As demonstrated by Andrey Sidorov, a solution can be reached by tracking the queue size explicitly and mingling the queuing code with the requesting code:

var useExplicitThrottling = function () {
  var active = 0
  var remaining = 10
  var queueRequests = function () {
    while(active < 2 && --remaining >= 0) {
      active++;
      pool.acquire(function (err, client) {
        if (err) {
          console.log("Error acquiring from pool")
          if (--active < 2) queueRequests()
          return
        }
        console.log("Handling request with client " + client)
        setTimeout(function () {
          pool.release(client)
          if(--active < 2) {
            queueRequests()
          }
        }, 1000)
      })
    }
  }
  queueRequests(10)
  console.log("Finished!")
}

Borrowing the streams pattern

The streams pattern is a solution which is idiomatic in node. Streams have a write operation which returns false when the stream cannot buffer more data. The same pattern can be applied to a pool object with acquire() returning false when the maximum number of clients have been acquired. A drain event is emitted when the number of active clients drops below the maximum. The pool abstraction is closed again and it's possible to omit explicit references to the pool size.

var useStreams = function () {
  var queueRequests = function (remaining) {
    var full = false
    pool.once('drain', function() {
        if (remaining) queueRequests(remaining)
    })

    while(!full && --remaining >= 0) {
      console.log("Sending request...")
      full = !pool.acquire(function (err, client) {
        if (err) {
          console.log("Error acquiring from pool")
          return
        }
        console.log("Handling request with client " + client)
        setTimeout(pool.release, 1000, client)
      })
    }
  }
  queueRequests(10)
  console.log("Finished!")
}

Fibers

An alternative solution can be obtained by providing a blocking abstraction on top of the queue. The fibers module exposes coroutines that are implemented in C++. By using fibers, it's possible to block an execution context without blocking the node event loop. While I find this approach to be quite elegant, it is often overlooked in the node community because of a curious aversion to all things synchronous-looking. Notice that, excluding the callcc utility, the actual loop logic is wonderfully concise.

/* This is the call-with-current-continuation found in Scheme and other
 * Lisps. It captures the current call context and passes a callback to
 * resume it as an argument to the function. Here, I've modified it to fit
 * JavaScript and node.js paradigms by making it a method on Function
 * objects and using function (err, result) style callbacks.
 */
Function.prototype.callcc = function(context  /* args... */) {
  var that = this,
      caller = Fiber.current,
      fiber = Fiber(function () {
        that.apply(context, Array.prototype.slice.call(arguments, 1).concat(
          function (err, result) {
            if (err)
              caller.throwInto(err)
            else
              caller.run(result)
          }
        ))
      })
  process.nextTick(fiber.run.bind(fiber))
  return Fiber.yield()
}

var useFibers = function () {
  var remaining = 10
  while(--remaining >= 0) {
    console.log("Sending request...")
    try {
      client = pool.acquire.callcc(this)
      console.log("Handling request with client " + client);
      setTimeout(pool.release, 1000, client)
    } catch (x) {
      console.log("Error acquiring from pool")
    }
  }
  console.log("Finished!")
}

Conclusion

There are a number of correct ways to approach the problem. However, for library authors or applications that require a single pool to be shared in many contexts it is best to properly encapsulate the pool. Doing so helps prevent errors and produces cleaner, more modular code. Preventing unconstrained queuing then becomes an evented dance or a coroutine pattern. I hope this answer dispels a lot of FUD and confusion around blocking-style code and asynchronous behavior and encourages you to write code which makes you happy.

like image 140
16 revs Avatar answered Oct 24 '22 21:10

16 revs