Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to scale websocket connection load as one adds/removes servers?

To explain the problem:

With HTTP:

Assume there are 100 requests/second arriving.

  1. If, there are 4 servers, the load balancer (LB) can distribute the load across them evenly, 25/second per server
  2. If i add a server (5 servers total), the LB balances it more evenly to now 20/second per server
  3. If i remove a server (3 servers total), the LB decreases load per server to 33.3/second per server

So the load per server is automatically balanced as i add/remove servers, since each connection is so short lived.

With Websockets

Assume there are 100 clients, 2 servers (behind a LB)

  1. The LB initially balances each incoming connection evenly, so each server has 50 connections.
  2. However, if I add a server (3 servers total), the 3rd servers gets 0 connections, since the existing 100 clients are already connected to the 2 servers.
  3. If i remove a server (1 server total), all those 100 connections will reconnect and are now served by 1 server.

Problem

Since websocket connections are persistent, adding/removing a server does not increase/decrease load per server until the clients decide to reconnect.

How does one then efficiently scale websockets and manage load per server?

like image 518
pdeva Avatar asked Mar 30 '19 23:03

pdeva


People also ask

Can WebSockets scale?

Compared to REST, WebSockets allow for higher efficiency and are easier to scale because they do not require the HTTP request/response overhead for each message sent and received. Furthermore, the WebSocket protocol is push-based, enabling you to push data to connected clients as soon as events occur.

Would WebSockets be able to handle 1000000 concurrent connections?

With at least 30 GiB RAM you can handle 1 million concurrent sockets.

Why WebSockets are not scalable?

But why are WebSockets hard to scale? The main challenge is that connections to your WebSocket server need to be persistent. And even once you've scaled out your server nodes both vertically and horizontally, you also need to provide a solution for sharing data between the nodes.

How do you horizontally scale a WebSocket?

Horizontal Scaling It needs to be configured with some other tools in order to build a fully scalable architecture. Using a Publish/Subscribe or pub/sub broker is an effective method of horizontally scaling WebSockets. There are several off-the-shelf solutions like Kafka or Redis that can make this happen.


1 Answers

This is similar to problems the gaming industry has been trying to solve for a long time. That is an area where you have many concurrent connections and you have to have fast communication between many clients.

Options:

  1. Slave/master architecture where master retains connection to slaves to monitor health, load, etc. When someone joins the session/application they ping the master and the master responds with the next server. This is kind of client side load balancing except you are using server side heuristics.

This prevents your clients from blowing up a single server. You'll have to have the client poll the master before establishing the WS connection but that is simple.

This way you can also scale out to multi master if you need to and put them behind load balancers.

If you need to send a message between servers there are many options for that (handle it yourself, queues, etc).

This is how my drawing app Pixmap for Android, which I built last year, works. Works very well too.

  1. Client side load balancing where client connects to a random host name. This is how Watch.ly works. Each host can then be its own load balancer and cluster of servers for safety. Risky but simple.

  2. Traditional load balancing - ie round robin. Hard to beat haproxy. This should be your first approach and will scale to many thousands of concurrent users. Doesn't solve the problem of redistributing load though. One way to solve that with this setup is to push an event to your clients telling them to reconnect (and have each attempt to reconnect with a random timeout so you don't kill your servers).

like image 109
winrid Avatar answered Sep 20 '22 13:09

winrid