Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Lots of parallel http requests in node.js

I've created a node.js script, that scans network for available HTTP pages, so there is a lot of connections i want to run in parallel, but it seems that some of the requests wait for previous to complete.

Following is the code fragment:

    var reply = { };
    reply.started = new Date().getTime();
    var req = http.request(options, function(res) {
        reply.status = res.statusCode;
        reply.rawHeaders = res.headers;
        reply.headers = JSON.stringify(res.headers);
        reply.body = '';
        res.setEncoding('utf8');
        res.on('data', function (chunk) {
            reply.body += chunk;
        });
        res.on('end', function () {
            reply.finished = new Date().getTime();
            reply.time = reply.finished - reply.started;
            callback(reply);
        });
    });
    req.on('error', function(e) {
        if(e.message == 'socket hang up') {
            return;
        }
        errCallback(e.message);
    });
    req.end();

This code performs only 10-20 requests per second, but i need 500-1k requests performance. Every queued request is made to a different HTTP server.

I've tried to do something like that, but it didn't help:

    http.globalAgent.maxSockets = 500;
like image 571
druidvav Avatar asked Jun 28 '13 19:06

druidvav


People also ask

How do you handle parallel request in node JS?

If NodeJS can process the request without I/O blocking then the event loop would itself process the request and sends the response back to the client by itself. But, it is possible to process multiple requests parallelly using the NodeJS cluster module or worker_threads module.

How many HTTP requests can node js handle?

js can handle ~15K requests per second, and the vanilla HTTP module can handle 70K rps.

Does node js support parallel processing?

To remedy this, Node. js introduced the worker-threads module, which allows you to create threads and execute multiple JavaScript tasks in parallel. Once a thread finishes a task, it sends a message to the main thread that contains the result of the operation so that it can be used with other parts of the code.


2 Answers

Something else must be going on with your code. Node can comfortably handle 1k+ requests per second.

I tested with the following simple code:

var http = require('http');

var results = [];
var j=0;

// Make 1000 parallel requests:
for (i=0;i<1000;i++) {
    http.request({
        host:'127.0.0.1',
        path:'/'
    },function(res){
        results.push(res.statusCode);
        j++;

        if (j==i) { // last request
            console.log(JSON.stringify(results));
        }
    }).end();
}

To purely test what node is capable of and not my home broadband connection the code requests from a local Nginx server. I also avoid console.log until all the requests have returned because it is implemented as a synchronous function (to avoid losing debugging messages when a program crash).

Running the code using time I get the following results:

real    0m1.093s
user    0m0.595s
sys     0m0.154s

That's 1.093 seconds for 1000 requests which makes it very close to 1k requests per second.


The simple code above will generate OS errors if you try to make a lot of requests (like 10000 or more) because node will happily try to open all those sockets in the for loop (remember: the requests don't start until the for loop ends, they are only created). You mentioned that your solution also runs into the same errors. To avoid this you should limit the number of parallel requests you make.

The simplest way of limiting number of parallel requests is to use one of the Limit functions form the async.js library:

var http = require('http');
var async = require('async');

var requests = [];

// Build a large list of requests:
for (i=0;i<10000;i++) {
    requests.push(function(callback){
        http.request({
            host:'127.0.0.1',
            path:'/'
        },function(res){
            callback(null,res.statusCode);
        }).end()
    });
}

// Make the requests, 100 at a time
async.parallelLimit(requests, 100,function(err, results){
    console.log(JSON.stringify(results));
});

Running this with time on my machine I get:

real    0m8.882s
user    0m4.036s
sys     0m1.569s

So that's 10k request in around 9 seconds or roughly 1.1k/s.

Look at the functions available from async.js.

like image 152
slebetman Avatar answered Sep 19 '22 16:09

slebetman


I've found solution for me, it is not very good, but works:

childProcess = require('child_process')

I'm using curl:

childProcess.exec('curl --max-time 20 --connect-timeout 10 -iSs "' + options.url + '"', function (error, stdout, stderr) { }

This allows me to run 800-1000 curl processes simultaneously. Of course, this solution has it's weekneses, like requirement for lots of open file decriptors, but works.

I've tried node-curl bindings, but that was very slow too.

like image 41
druidvav Avatar answered Sep 22 '22 16:09

druidvav