Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Node js - http.request() problems with connection pooling

Consider the following simple Node.js application:

var http = require('http');
http.createServer(function() { }).listen(8124); // Prevent process shutting down

var requestNo = 1;
var maxRequests = 2000;

function requestTest() {
    http.request({ host: 'www.google.com', method: 'GET' }, function(res) {
        console.log('Completed ' + (requestNo++));

        if (requestNo <= maxRequests) {
            requestTest();
        }
    }).end();
}

requestTest();

It makes 2000 HTTP requests to google.com, one after the other. The problem is it gets to request No. 5 and pauses for about 3 mins, then continues processing requests 6 - 10, then pauses for another 3 minutes, then requests 11 - 15, pauses, and so on. Edit: I tried changing www.google.com to localhost, an extremely basic Node.js app running my machine that returns "Hello world", I still get the 3 minute pause.

Now I read I can increase the connection pool limit:

http.globalAgent.maxSockets = 20;

Now if I run it, it processes requests 1 - 20, then pauses for 3 mins, then requests 21 - 40, then pauses, and so on.

Finally, after a bit of research, I learned I could disable connection pooling entirely by setting agent: false in the request options:

http.request({ host: 'www.google.com', method: 'GET', agent: false }, function(res) {
    ...snip....

...and it'll run through all 2000 requests just fine.

My question, is it a good idea to do this? Is there a danger that I could end up with too many HTTP connections? And why does it pause for 3 mins, surely if I've finished with the connection it should add it straight back into the pool ready for the next request to use, so why is it waiting 3 mins? Forgive my ignorance.

Failing that, what is the best strategy for a Node.js app making a potentially large number HTTP requests, without locking up, or crashing?

I'm running Node.js version 0.10 on Mac OSX 10.8.2.


Edit: I've found if I convert the above code into a for loop and try to establish a bunch of connections at the same time, I start getting errors after about 242 connections. The error is:

Error was thrown: connect EMFILE
(libuv) Failed to create kqueue (24)

...and the code...

for (var i = 1; i <= 2000; i++) {
    (function(requestNo) {
        var request = http.request({ host: 'www.google.com', method: 'GET', agent: false }, function(res) {
            console.log('Completed ' + requestNo);
        });

        request.on('error', function(e) {
            console.log(e.name + ' was thrown: ' + e.message);
        });

        request.end();
    })(i);
}

I don't know if a heavily loaded Node.js app could ever reach that many simultaneous connections.

like image 555
Sunday Ironfoot Avatar asked Mar 20 '13 19:03

Sunday Ironfoot


1 Answers

You have to consume the response.

Remember, in v0.10, we landed streams2. That means that data events don't happen until you start looking for them. So, you can do stuff like this:

http.createServer(function(req, res) {
  // this does some I/O, async
  // in 0.8, you'd lose data chunks, or even the 'end' event!
  lookUpSessionInDb(req, function(er, session) {
    if (er) {
      res.statusCode = 500;
      res.end("oopsie");
    } else {
      // no data lost
      req.on('data', handleUpload);
      // end event didn't fire while we were looking it up
      req.on('end', function() {
        res.end('ok, got your stuff');
      });
    }
  });
});

However, the flip side of streams that don't lose data when you're not reading it, is that they actually don't lose data if you're not reading it! That is, they start out paused, and you have to read them to get anything out.

So, what's happening in your test is that you're making a bunch of requests and not consuming the responses, and then eventually the socket gets killed by google because nothing is happening, and it assumes you've died.

There are some cases where it's impossible to consume the incoming message: that is, if you don't add a response event handler on a requests, or where you completely write and finish the response message on a server without ever reading the request. In those cases, we just dump the data in the garbage for you.

However, if you are listening to the 'response' event, it's your responsibility to handle the object. Add a response.resume() in your first example, and you'll see it processes on through at a reasonable pace.

like image 120
isaacs Avatar answered Sep 22 '22 23:09

isaacs