Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Scaling Socket.IO to multiple Node.js processes using cluster

Tearing my hair out with this one... has anyone managed to scale Socket.IO to multiple "worker" processes spawned by Node.js's cluster module?

Lets say I have the following on four worker processes (pseudo):

// on the server var express = require('express'); var server = express(); var socket = require('socket.io'); var io = socket.listen(server);  // socket.io io.set('store', new socket.RedisStore);  // set-up connections... io.sockets.on('connection', function(socket) {    socket.on('join', function(rooms) {     rooms.forEach(function(room) {       socket.join(room);     });   });    socket.on('leave', function(rooms) {     rooms.forEach(function(room) {       socket.leave(room);     });   });  });  // Emit a message every second function send() {   io.sockets.in('room').emit('data', 'howdy'); }  setInterval(send, 1000); 

And on the browser...

// on the client socket = io.connect(); socket.emit('join', ['room']);  socket.on('data', function(data){   console.log(data); }); 

The problem: Every second, I'm receiving four messages, due to four separate worker processes sending the messages.

How do I ensure the message is only sent once?

like image 211
Lee Benson Avatar asked Aug 19 '13 09:08

Lee Benson


People also ask

Is it possible to cluster multiple node processes?

Node. js applications can be parallelized using cluster modules in order to use the system more efficiently. Running multiple processes at the same time can be done using few lines of code and this makes the migration relatively easy, as Node.

Does socket.io scale?

It doesn't scale as well as http because it uses a persistent socket connection and there's both a theoretical limit of number of sockets and practical OS limits.

How many concurrent connections can socket.io handle?

Once you reboot your machine, you will now be able to happily go to 55k concurrent connections (per incoming IP).

Is socket.io multithreaded?

No, it's not multithreaded.


1 Answers

Edit: In Socket.IO 1.0+, rather than setting a store with multiple Redis clients, a simpler Redis adapter module can now be used.

var io = require('socket.io')(3000); var redis = require('socket.io-redis'); io.adapter(redis({ host: 'localhost', port: 6379 })); 

The example shown below would look more like this:

var cluster = require('cluster'); var os = require('os');  if (cluster.isMaster) {   // we create a HTTP server, but we do not use listen   // that way, we have a socket.io server that doesn't accept connections   var server = require('http').createServer();   var io = require('socket.io').listen(server);   var redis = require('socket.io-redis');    io.adapter(redis({ host: 'localhost', port: 6379 }));    setInterval(function() {     // all workers will receive this in Redis, and emit     io.emit('data', 'payload');   }, 1000);    for (var i = 0; i < os.cpus().length; i++) {     cluster.fork();   }    cluster.on('exit', function(worker, code, signal) {     console.log('worker ' + worker.process.pid + ' died');   });  }  if (cluster.isWorker) {   var express = require('express');   var app = express();    var http = require('http');   var server = http.createServer(app);   var io = require('socket.io').listen(server);   var redis = require('socket.io-redis');    io.adapter(redis({ host: 'localhost', port: 6379 }));   io.on('connection', function(socket) {     socket.emit('data', 'connected to worker: ' + cluster.worker.id);   });    app.listen(80); } 

If you have a master node that needs to publish to other Socket.IO processes, but doesn't accept socket connections itself, use socket.io-emitter instead of socket.io-redis.

If you are having trouble scaling, run your Node applications with DEBUG=*. Socket.IO now implements debug which will also print out Redis adapter debug messages. Example output:

socket.io:server initializing namespace / +0ms socket.io:server creating engine.io instance with opts {"path":"/socket.io"} +2ms socket.io:server attaching client serving req handler +2ms socket.io-parser encoding packet {"type":2,"data":["event","payload"],"nsp":"/"} +0ms socket.io-parser encoded {"type":2,"data":["event","payload"],"nsp":"/"} as 2["event","payload"] +1ms socket.io-redis ignore same uid +0ms 

If both your master and child processes both display the same parser messages, then your application is properly scaling.


There shouldn't be a problem with your setup if you are emitting from a single worker. What you're doing is emitting from all four workers, and due to Redis publish/subscribe, the messages aren't duplicated, but written four times, as you asked the application to do. Here's a simple diagram of what Redis does:

Client  <--  Worker 1 emit -->  Redis Client  <--  Worker 2  <----------| Client  <--  Worker 3  <----------| Client  <--  Worker 4  <----------| 

As you can see, when you emit from a worker, it will publish the emit to Redis, and it will be mirrored from other workers, which have subscribed to the Redis database. This also means you can use multiple socket servers connected the the same instance, and an emit on one server will be fired on all connected servers.

With cluster, when a client connects, it will connect to one of your four workers, not all four. That also means anything you emit from that worker will only be shown once to the client. So yes, the application is scaling, but the way you're doing it, you're emitting from all four workers, and the Redis database is making it as if you were calling it four times on a single worker. If a client actually connected to all four of your socket instances, they'd be receiving sixteen messages a second, not four.

The type of socket handling depends on the type of application you're going to have. If you're going to handle clients individually, then you should have no problem, because the connection event will only fire for one worker per one client. If you need a global "heartbeat", then you could have a socket handler in your master process. Since workers die when the master process dies, you should offset the connection load off of the master process, and let the children handle connections. Here's an example:

var cluster = require('cluster'); var os = require('os');  if (cluster.isMaster) {   // we create a HTTP server, but we do not use listen   // that way, we have a socket.io server that doesn't accept connections   var server = require('http').createServer();   var io = require('socket.io').listen(server);    var RedisStore = require('socket.io/lib/stores/redis');   var redis = require('socket.io/node_modules/redis');    io.set('store', new RedisStore({     redisPub: redis.createClient(),     redisSub: redis.createClient(),     redisClient: redis.createClient()   }));    setInterval(function() {     // all workers will receive this in Redis, and emit     io.sockets.emit('data', 'payload');   }, 1000);    for (var i = 0; i < os.cpus().length; i++) {     cluster.fork();   }    cluster.on('exit', function(worker, code, signal) {     console.log('worker ' + worker.process.pid + ' died');   });  }  if (cluster.isWorker) {   var express = require('express');   var app = express();    var http = require('http');   var server = http.createServer(app);   var io = require('socket.io').listen(server);    var RedisStore = require('socket.io/lib/stores/redis');   var redis = require('socket.io/node_modules/redis');    io.set('store', new RedisStore({     redisPub: redis.createClient(),     redisSub: redis.createClient(),     redisClient: redis.createClient()   }));    io.sockets.on('connection', function(socket) {     socket.emit('data', 'connected to worker: ' + cluster.worker.id);   });    app.listen(80); } 

In the example, there are five Socket.IO instances, one being the master, and four being the children. The master server never calls listen() so there is no connection overhead on that process. However, if you call an emit on the master process, it will be published to Redis, and the four worker processes will perform the emit on their clients. This offsets connection load to workers, and if a worker were to die, your main application logic would be untouched in the master.

Note that with Redis, all emits, even in a namespace or room will be processed by other worker processes as if you triggered the emit from that process. In other words, if you have two Socket.IO instances with one Redis instance, calling emit() on a socket in the first worker will send the data to its clients, while worker two will do the same as if you called the emit from that worker.

like image 198
hexacyanide Avatar answered Oct 22 '22 03:10

hexacyanide