In case someone wants to try: https://github.com/codependent/cluster-performance
I am testing Node.js (v0.11.13 - Windows 7) request per second limits with a simple application. I have implemented a service with Express 4 that simulates an I/O operation such as a DB query with a setTimeout callback.
First I test it with only one node process. For the second test I start as many workers as CPUs the machine has.
I am using loadtest to test the service with the following parameters:
loadtest -n 50000 -c 220 -k http://localhost:5000/operations/timeout/20
That is to say, 50k total requests, 220 concurrent clients.
My service sets the timeout (the duration of the processing time) according to the last url parameter (20 mseg):
router.route('/timeout/:time')
.get(function(req, res) {
setTimeout(function(){
appLog.debug("Timeout completed %d", process.pid);
res.json(200,{result:process.pid});
},req.param('time'));
});
These are the results:
INFO Max requests: 50000
INFO Concurrency level: 200
INFO Agent: keepalive
INFO
INFO Completed requests: 50000
INFO Total errors: 0
INFO Total time: 19.326443741 s
INFO Requests per second: 2587
INFO Total time: 19.326443741 s
INFO
INFO Percentage of the requests served within a certain time
INFO 50% 75 ms
INFO 90% 92 ms
INFO 95% 100 ms
INFO 99% 117 ms
INFO 100% 238 ms (longest request)
2580 requests per second, not bad.
In this case I distribute the load equally among the workers using the round robin scheduling policy. Since now there are 8 cores processing requests I was expecting a significant improvement (8 times faster?) in the requests per second results but it only increased to 2905 rps!!! (318 rps more) How can you explain that? Am I doing something wrong?
Results:
Max requests: 50000
Concurrency level: 220
Agent: keepalive
Completed requests: 50000
Total errors: 0
Total time: 17.209989764000003 s
Requests per second: 2905
Total time: 17.209989764000003 s
Percentage of the requests served within a certain time
50% 69 ms
90% 103 ms
95% 112 ms
99% 143 ms
100% 284 ms (longest request)
My cluster initialization code:
#!/usr/bin/env node
var nconf = require('../lib/config');
var app = require('express')();
var debug = require('debug')('mma-nodevents');
var http = require("http")
var appConfigurer = require('../app');
var cluster = require('cluster');
var numCPUs = require('os').cpus().length;
if('v0.11.13'.localeCompare(process.version)>=0){
cluster.schedulingPolicy = cluster.SCHED_RR;
}
if (cluster.isMaster) {
// Fork workers.
for (var i = 0; i < numCPUs; i++) {
cluster.fork();
}
cluster.on('exit', function(worker, code, signal) {
console.log('worker ' + worker.process.pid + ' died');
cluster.fork();
});
}else{
console.log("starting worker [%d]",process.pid);
appConfigurer(app);
var server = http.createServer(app);
server.listen(nconf.get('port'), function(){
debug('Express server listening on port ' + nconf.get('port'));
});
}
module.exports = app;
UPDATE:
I have finally accepted slebetman's answer since he was right about the reason why in this case cluster performance didn't significantly increase with up to 8 processes. However I would like to point out an interesting fact: with the current io.js version (2.4.0), it has really improved even for this high I/O operation (setTimeout):
loadtest -n 50000 -c 220 -k http://localhost:5000/operations/timeout/20
Single thread:
Max requests: 50000
Concurrency level: 220
Agent: keepalive
Completed requests: 50000
Total errors: 0
Total time: 13.391324847 s
Requests per second: 3734
Total time: 13.391324847 s
Percentage of the requests served within a certain time
50% 57 ms
90% 67 ms
95% 74 ms
99% 118 ms
100% 230 ms (longest request)
8 core cluster:
Max requests: 50000
Concurrency level: 220
Agent: keepalive
Completed requests: 50000
Total errors: 0
Total time: 8.253544166 s
Requests per second: 6058
Total time: 8.253544166 s
Percentage of the requests served within a certain time
50% 35 ms
90% 47 ms
95% 52 ms
99% 68 ms
100% 178 ms (longest request)
So it's clear that with the current io.js/node.js releases, although you don't get an 8x rps increase, the thoughput is almost 1.7 times faster.
On the other hand, as expected, using a for loop iterating for the amount of milliseconds indicated in the request (and thus, blocking the thread), the rps increases proportionally to the number of threads.
The Node. js Cluster module enables the creation of child processes (workers) that run simultaneously and share the same server port. Each spawned child has its own event loop, memory, and V8 instance. The child processes use IPC (Inter-process communication) to communicate with the parent Node.
The single-threaded implementation makes Node a bad choice for CPU-intensive programs. When a time-consuming task is running in the program it blocks the event loop from moving forward for a longer period.
Node. js applications can be parallelized using cluster modules in order to use the system more efficiently. Running multiple processes at the same time can be done using few lines of code and this makes the migration relatively easy, as Node.
I/O operations is exactly the kind of application Node.js was designed and optimized for. I/O operations (and setTimeout) essentially run in parallel as much as the hardware (network, disk, PCI bridge, DMA controller etc.) allows.
Once you realize this, it's easy to understand why running many parallel I/O operations in a single process takes roughly the same amount of time as running many parallel I/O operations in many processes/threads. Indeed, the direct analog would be running many parallel I/O operations in one process is exactly the same as running single blocking I/O operations in many parallel processes.
Clustering allows you to use multiple CPUs/cores if you have them. But your process don't use CPU cycles. So clustering give you very little advantage (if any).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With