In using Node.js to query some public APIs via HTTP requests. Therefore, I'm using the request
module. I'm measuring the response time within my application, and see that my application return the results from API queries about 2-3 times slower than "direct" requests via curl or in the browser. Also, I noticed that connections to HTTPS enabled services usually take longer than plain HTTP ones, but this can be a coincidence.
I tried to optimize my request
options, but to no avail. For example, I query
https://www.linkedin.com/countserv/count/share?url=http%3A%2F%2Fwww.google.com%2F&lang=en_US
I'm using request.defaults
to set the overall defaults for all requests:
var baseRequest = request.defaults({
pool: {maxSockets: Infinity},
jar: true,
json: true,
timeout: 5000,
gzip: true,
headers: {
'Content-Type': 'application/json'
}
});
The actual request are done via
...
var start = new Date().getTime();
var options = {
url: 'https://www.linkedin.com/countserv/count/share?url=http%3A%2F%2Fwww.google.com%2F&lang=en_US',
method: 'GET'
};
baseRequest(options, function(error, response, body) {
if (error) {
console.log(error);
} else {
console.log((new Date().getTime()-start) + ": " + response.statusCode);
}
});
Does anybody see optimization potential? Am I doing something completely wrong? Thanks in advance for any advice!
As is, node. js can process upwards of 1000 requests per second and speed limited only to the speed of your network card. Note that it's 1000 requests per second not clients connected simultaneously. It can handle the 10000 simultaneous clients without issue.
Since Node. js uses non-blocking IO, the server can handle multiple requests without waiting for each one to complete, which means Node. js can handle a much higher volume of web traffic than other more traditional languages.
How NodeJS handle multiple client requests? NodeJS receives multiple client requests and places them into EventQueue. NodeJS is built with the concept of event-driven architecture. NodeJS has its own EventLoop which is an infinite loop that receives requests and processes them.
Unlike the browser where Javascript is sandboxed for your safety, node. js has full access to the system like any other native application. This means you can read and write directly to/from the file system, have unrestricted access to the network, can execute software and more.
There are several potential issues you'll need to address given what I understand from your architecture. In no particular order they are:
request
will always be slower than using http
directly since as the wise man once said: "abstraction costs". ;) In fact, to squeeze out every possible ounce of performance, I'd handle all HTTP requests using node's net
module directly. For HTTPS, it's not worth rewriting the https
module. And for the record, HTTPS will always be slower than HTTP by definition due to both the need to handshake cryptographic keys and do the crypt/decrypt work on the payload.I'll add more suggestions as they occur to me.
More on the topic of multiple requests to the same endpoint:
If you need to retrieve a number of resources from the same endpoint, it would be useful to segment your requests to specific workers that maintain open connections to that endpoint. In that way, you can be assured that you can get the requested resource as quickly as possible without the overhead of the initial TCP handshake.
TCP handshake is a three-stage process.
Step one: client sends a SYN packet to the remote server. Step two: the remote server replies to the client with a SYN+ACK. Step three: the client replies to the remote server with an ACK.
Depending on the client's latency to the remote server, this can add up to (as William Proxmire once said) "real money", or in this case, delay.
From my desktop, the current latency (round-trip time measure by ping) for a 2K octet packet to www.google.com is anywhere between 37 and 227ms.
So assuming that we can rely on a round-trip mean of 95ms (over a perfect connection), the time for the initial TCP handshake would be around 130ms or SYN(45ms) + SYN+ACK(45ms) + ACK(45ms) and this is a tenth of a second just to establish the initial connection.
If the connection requires retransmission, it could take much longer.
And this is assuming you retrieve a single resource over a new TCP connection.
To ameliorate this, I'd have your workers keep a pool of open connections to "known" destinations which they would then advertise back to the supervisor process so it could direct requests to the least loaded server with a "live" connection to the target server.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With