Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Node.js cluster doesn't significantly improve performance

In case someone wants to try: https://github.com/codependent/cluster-performance

I am testing Node.js (v0.11.13 - Windows 7) request per second limits with a simple application. I have implemented a service with Express 4 that simulates an I/O operation such as a DB query with a setTimeout callback.

First I test it with only one node process. For the second test I start as many workers as CPUs the machine has.

I am using loadtest to test the service with the following parameters:

loadtest -n 50000 -c 220 -k http://localhost:5000/operations/timeout/20

That is to say, 50k total requests, 220 concurrent clients.

My service sets the timeout (the duration of the processing time) according to the last url parameter (20 mseg):

router.route('/timeout/:time')
.get(function(req, res) {
    setTimeout(function(){
        appLog.debug("Timeout completed %d", process.pid);
        res.json(200,{result:process.pid});
    },req.param('time'));
});    
  1. Only one node process

These are the results:

INFO Max requests:        50000
INFO Concurrency level:   200
INFO Agent:               keepalive
INFO
INFO Completed requests:  50000
INFO Total errors:        0
INFO Total time:          19.326443741 s
INFO Requests per second: 2587
INFO Total time:          19.326443741 s
INFO
INFO Percentage of the requests served within a certain time
INFO   50%      75 ms
INFO   90%      92 ms
INFO   95%      100 ms
INFO   99%      117 ms
INFO  100%      238 ms (longest request)

2580 requests per second, not bad.

  1. n workers (n = numCPUs)

In this case I distribute the load equally among the workers using the round robin scheduling policy. Since now there are 8 cores processing requests I was expecting a significant improvement (8 times faster?) in the requests per second results but it only increased to 2905 rps!!! (318 rps more) How can you explain that? Am I doing something wrong?

Results:

Max requests:        50000
Concurrency level:   220
Agent:               keepalive

Completed requests:  50000
Total errors:        0
Total time:          17.209989764000003 s
Requests per second: 2905
Total time:          17.209989764000003 s

Percentage of the requests served within a certain time
  50%      69 ms
  90%      103 ms
  95%      112 ms
  99%      143 ms
 100%      284 ms (longest request)

My cluster initialization code:

#!/usr/bin/env node
var nconf = require('../lib/config');
var app = require('express')();
var debug = require('debug')('mma-nodevents');
var http = require("http")
var appConfigurer = require('../app');
var cluster = require('cluster');
var numCPUs = require('os').cpus().length;

if('v0.11.13'.localeCompare(process.version)>=0){
    cluster.schedulingPolicy = cluster.SCHED_RR;
}

if (cluster.isMaster) {
    // Fork workers.
    for (var i = 0; i < numCPUs; i++) {
        cluster.fork();
    }
    cluster.on('exit', function(worker, code, signal) {
        console.log('worker ' + worker.process.pid + ' died');
        cluster.fork();
    });
}else{
    console.log("starting worker [%d]",process.pid);
    appConfigurer(app);
    var server = http.createServer(app);
    server.listen(nconf.get('port'), function(){
        debug('Express server listening on port ' + nconf.get('port'));
    });

}

module.exports = app;

UPDATE:

I have finally accepted slebetman's answer since he was right about the reason why in this case cluster performance didn't significantly increase with up to 8 processes. However I would like to point out an interesting fact: with the current io.js version (2.4.0), it has really improved even for this high I/O operation (setTimeout):

loadtest -n 50000 -c 220 -k http://localhost:5000/operations/timeout/20

Single thread:

Max requests:        50000
Concurrency level:   220
Agent:               keepalive

Completed requests:  50000
Total errors:        0
Total time:          13.391324847 s
Requests per second: 3734
Total time:          13.391324847 s

Percentage of the requests served within a certain time
  50%      57 ms
  90%      67 ms
  95%      74 ms
  99%      118 ms
 100%      230 ms (longest request)

8 core cluster:

Max requests:        50000
Concurrency level:   220
Agent:               keepalive

Completed requests:  50000
Total errors:        0
Total time:          8.253544166 s
Requests per second: 6058
Total time:          8.253544166 s

Percentage of the requests served within a certain time
  50%      35 ms
  90%      47 ms
  95%      52 ms
  99%      68 ms
 100%      178 ms (longest request)

So it's clear that with the current io.js/node.js releases, although you don't get an 8x rps increase, the thoughput is almost 1.7 times faster.

On the other hand, as expected, using a for loop iterating for the amount of milliseconds indicated in the request (and thus, blocking the thread), the rps increases proportionally to the number of threads.

like image 969
codependent Avatar asked Nov 06 '14 14:11

codependent


People also ask

Why is NodeJS cluster important?

The Node. js Cluster module enables the creation of child processes (workers) that run simultaneously and share the same server port. Each spawned child has its own event loop, memory, and V8 instance. The child processes use IPC (Inter-process communication) to communicate with the parent Node.

Why is NodeJS not suitable for CPU-intensive work?

The single-threaded implementation makes Node a bad choice for CPU-intensive programs. When a time-consuming task is running in the program it blocks the event loop from moving forward for a longer period.

Is it possible to cluster multiple node processes?

Node. js applications can be parallelized using cluster modules in order to use the system more efficiently. Running multiple processes at the same time can be done using few lines of code and this makes the migration relatively easy, as Node.


1 Answers

I/O operations is exactly the kind of application Node.js was designed and optimized for. I/O operations (and setTimeout) essentially run in parallel as much as the hardware (network, disk, PCI bridge, DMA controller etc.) allows.

Once you realize this, it's easy to understand why running many parallel I/O operations in a single process takes roughly the same amount of time as running many parallel I/O operations in many processes/threads. Indeed, the direct analog would be running many parallel I/O operations in one process is exactly the same as running single blocking I/O operations in many parallel processes.

Clustering allows you to use multiple CPUs/cores if you have them. But your process don't use CPU cycles. So clustering give you very little advantage (if any).

like image 57
slebetman Avatar answered Sep 20 '22 00:09

slebetman