Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is making sequential HTTP requests a blocking operation in node?

Note that irrelevant information to my question will be 'quoted'

like so (feel free to skip these).

Problem

I am using node to make in-order HTTP requests on behalf of multiple clients. This way, what originally took the client(s) several different page loads to get the desired result, now only takes a single request via my server. I am currently using the ‘async’ module for flow control and ‘request’ module for making the HTTP requests. There are approximately 5 callbacks which, using console.time, takes about ~2 seconds from start to finish (sketch code included below).

Now I am rather inexperienced with node, but I am aware of the single-threaded nature of node. While I have read many times that node isn’t built for CPU-bound tasks, I didn’t really understand what that meant until now. If I have a correct understanding of what’s going on, this means that what I currently have (in development) is in no way going to scale to even more than 10 clients.

Question

Since I am not an expert at node, I ask this question (in the title) to get a confirmation that making several sequential HTTP requests is indeed blocking.

Epilogue

If that is the case, I expect I will ask a different SO question (after doing the appropriate research) discussing various possible solutions, should I choose to continue approaching this problem in node (which itself may not be suitable for what I'm trying to do).

Other closing thoughts

I am truly sorry if this question was not detailed enough, too noobish, or had particularly flowery language (I try to be concise).

Thanks and all the upvotes to anyone who can help me with my problem!

The code I mentioned earlier:

var async = require('async');
var request = require('request');

...

async.waterfall([
    function(cb) {
        console.time('1');

        request(someUrl1, function(err, res, body) {
            // load and parse the given web page.

            // make a callback with data parsed from the web page
        });
    },
    function(someParameters, cb) {
        console.timeEnd('1');
        console.time('2');

        request({url: someUrl2, method: 'POST', form: {/* data */}}, function(err, res, body) {
            // more computation

            // make a callback with a session cookie given by the visited url
        });
    },
    function(jar, cb) {
        console.timeEnd('2');
        console.time('3');

        request({url: someUrl3, method: 'GET', jar: jar /* cookie from the previous callback */}, function(err, res, body) {
            // do more parsing + computation

            // make another callback with the results
        });
    },
    function(moreParameters, cb) {
        console.timeEnd('3');
        console.time('4');

        request({url: someUrl4, method: 'POST', jar: jar, form : {/*data*/}}, function(err, res, body) {
            // make final callback after some more computation.
            //This part takes about ~1s to complete
        });
    }
], function (err, result) {
    console.timeEnd('4'); //
    res.status(200).send();
});
like image 642
youngrrrr Avatar asked Oct 06 '15 02:10

youngrrrr


2 Answers

Normally, I/O in node.js are non-blocking. You can test this out by making several requests simultaneously to your server. For example, if each request takes 1 second to process, a blocking server would take 2 seconds to process 2 simultaneous requests but a non-blocking server would take just a bit more than 1 second to process both requests.

However, you can deliberately make requests blocking by using the sync-request module instead of request. Obviously, that's not recommended for servers.

Here's a bit of code to demonstrate the difference between blocking and non-blocking I/O:

var req = require('request');
var sync = require('sync-request');

// Load example.com N times (yes, it's a real website):
var N = 10;

console.log('BLOCKING test ==========');
var start = new Date().valueOf();
for (var i=0;i<N;i++) {
    var res = sync('GET','http://www.example.com')
    console.log('Downloaded ' + res.getBody().length + ' bytes');
}
var end = new Date().valueOf();
console.log('Total time: ' + (end-start) + 'ms');

console.log('NON-BLOCKING test ======');
var loaded = 0;
var start = new Date().valueOf();
for (var i=0;i<N;i++) {
    req('http://www.example.com',function( err, response, body ) {
        loaded++;
        console.log('Downloaded ' + body.length + ' bytes');
        if (loaded == N) {
            var end = new Date().valueOf();
            console.log('Total time: ' + (end-start) + 'ms');
        }
    })
}

Running the code above you'll see the non-blocking test takes roughly the same amount of time to process all requests as it does for a single request (for example, if you set N = 10, the non-blocking code executes 10 times faster than the blocking code). This clearly illustrates that the requests are non-blocking.


Additional answer:

You also mentioned that you're worried about your process being CPU intensive. But in your code, you're not benchmarking CPU utility. You're mixing both network request time (I/O, which we know is non-blocking) and CPU process time. To measure how much time the request is in blocking mode, change your code to this:

async.waterfall([
    function(cb) {
        request(someUrl1, function(err, res, body) {
            console.time('1');
            // load and parse the given web page.
            console.timeEnd('1');
            // make a callback with data parsed from the web page
        });
    },
    function(someParameters, cb) {
        request({url: someUrl2, method: 'POST', form: {/* data */}}, function(err, res, body) {
            console.time('2');
            // more computation
            console.timeEnd('2');

            // make a callback with a session cookie given by the visited url
        });
    },
    function(jar, cb) {
        request({url: someUrl3, method: 'GET', jar: jar /* cookie from the previous callback */}, function(err, res, body) {
            console.time('3');
            // do more parsing + computation
            console.timeEnd('3');
            // make another callback with the results
        });
    },
    function(moreParameters, cb) {
        request({url: someUrl4, method: 'POST', jar: jar, form : {/*data*/}}, function(err, res, body) {
            console.time('4');
            // some more computation.
            console.timeEnd('4');

            // make final callback
        });
    }
], function (err, result) {
    res.status(200).send();
});

Your code only blocks in the "more computation" parts. So you can completely ignore any time spent waiting for the other parts to execute. In fact, that's exactly how node can serve multiple requests concurrently. While waiting for the other parts to call the respective callbacks (you mention that it may take up to 1 second) node can execute other javascript code and handle other requests.

like image 140
slebetman Avatar answered Oct 20 '22 02:10

slebetman


Your code is non-blocking because it uses non-blocking I/O with the request() function. This means that node.js is free to service other requests while your series of http requests is being fetched.

What async.waterfall() does it to order your requests to be sequential and pass the results of one on to the next. The requests themselves are non-blocking and async.waterfall() does not change or influence that. The series you have just means that you have multiple non-blocking requests in a row.

What you have is analogous to a series of nested setTimeout() calls. For example, this sequence of code takes 5 seconds to get to the inner callback (like your async.waterfall() takes n seconds to get to the last callback):

setTimeout(function() {
    setTimeout(function() {
        setTimeout(function() {
            setTimeout(function() {
                setTimeout(function() {
                    // it takes 5 seconds to get here
                }, 1000);
            }, 1000);
        }, 1000);
    }, 1000);
}, 1000);

But, this uses basically zero CPU because it's just 5 consecutive asynchronous operations. The actual node.js process is involved for probably no more than 1ms to schedule the next setTimeout() and then the node.js process literally could be doing lots of other things until the system posts an event to fire the next timer.

You can read more about how the node.js event queue works in these references:

Run Arbitrary Code While Waiting For Callback in Node?

blocking code in non-blocking http server

Hidden threads in Javascript/Node that never execute user code: is it possible, and if so could it lead to an arcane possibility for a race condition?

How does JavaScript handle AJAX responses in the background? (written about the browser, but concept is the same)

If I have a correct understanding of what’s going on, this means that what I currently have (in development) is in no way going to scale to even more than 10 clients.

This is not a correct understanding. A node.js process can easily have thousands of non-blocking requests in flight at the same time. Your sequentially measured time is only a start to finish time - it has nothing to do with CPU resources or other OS resources consumed (see comments below on non-blocking resource consumption).

I still have concerns about using node for this particular application then. I'm worried about how it will scale considering that the work it is doing is not simple I/O but computationally intensive. I feel as though I should switch to a platform that enables multi-threading. Does what I'm asking/the concern I'm expressing make sense? I could just be spitting total BS and have no idea what I'm talking about.

Non-blocking I/O consumes almost no CPU (only a little when the request is originally sent and then a little when the result arrives back), but while the compmuter is waiting for the remove result, no CPU is consumed at all and no OS thread is consumed. This is one of the reasons that node.js scales well for non-blocking I/O as no resources are used when the computer is waiting for a response from a remove site.

If your processing of the request is computationally intensive (e.g. takes a measurable amount of pure blocking CPU time to process), then yes you would want to explore getting multiple processes involved in running the computations. There are multiple ways to do this. You can use clustering (so you simply have multiple identical node.js processes each working on requests from different clients) with the nodejs clustering module. Or, you can create a work queue of computationally intensive work to do and have a set of child processes that do the computationally intensive work. Or, there are several other options too. This not the type of problem that one needs to switch away from node.js to solve - it can be solved using node.js just fine.

like image 26
jfriend00 Avatar answered Oct 20 '22 03:10

jfriend00