Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Erlang/Elixir on Docker and Hot Code Swap

One of the features of Erlang (and, by definition, Elixir) is that you can do hot code swap. However, this seems to be at odd with Docker, where you would need to stop your instances and restart new ones with new images holding the new code. This essentially seem to be what everyone does.

This being said, I also know that it is possible to use one hidden node to distribute updates to all other nodes over network. Of course, just like that is sounds like asking for trouble, but...

My question would be the following: has anyone tried and achieved with reasonable success to set up a Docker-based infrastructure for Erlang/Elixir that allowed Hot-code swapping? If so, what are the do's, don'ts and caveats?

like image 939
Marc Trudel Avatar asked Mar 23 '16 08:03

Marc Trudel


1 Answers

The story

Imagine a system to handle mobile phone calls or mobile data access (that's what Erlang was created for). There are gateway servers that maintain the user session for the duration of the call, or the data access session (I will call it the session going forward). Those server have an in-memory representation of the session for as long as the session is active (user is connected).

Now there is another system that calculates how much to charge the user for the call or the data transfered (call it PDF - Policy Decision Function). Both systems are connected in such a way that the gateway server creates a handful of TCP connections to PDF and it drops users sessions if those TCP connections go down. The gateway can handle a few hundred thousand customers at a time. Whenever there is an event that the user needs to be charged for (next data transfer, another minute of the call) the gateway notifies PDF about the fact and PDF subtracts a specific amount of money from the user account. When the user account is empty PDF notifies the gateway to disconnect the call (you've run out of money, you need to top up).

Your question

Finally let's talk about your question in this context. We want to upgrade a PDF node and the node is running on Docker. We create a new Docker instance with the new version of the software, but we can't shut down the old version (there are hundreds of thousands of customers in the middle of their call, we can't disconnect them). But we need to move the customers somehow from the old PDF to the new version. So we tell the gateway node to create any new connections with the updated node instead of the old PDF. Customers can be chatty and also some of them may have a long-running data connections (downloading Windows 10 iso) so the whole operation takes 2-3 days to complete. That's how long it can take to upgrade one version of the software to another in case of a critical bug. And there may be dozens of servers like this one, each one handling hundreds thousands of customers.

But what if we used the Erlang release handler instead? We create the relup file with the new version of the software. We test it properly and deploy to PDF nodes. Each node is upgraded in-place - the internal state of the application is converted, the node is running the new version of the software. But most importantly, the TCP connection with the gateway server has not been dropped. So customers happily continue their calls or are downloading the latest Windows iso while we are upgrading the system. All is done in 10 seconds rather than 2-3 days.

The answer

This is an example of a specific system with specific requirements. Docker and Erlang's Release Handling are orthogonal technologies. You can use either or both, it all boils down to the following:

  • Requirements
  • Cost

Will you have enough resources to test both approaches predictably and enough patience to teach your Ops team so that they can deploy the system using either method? What if the testing facility cost millions of pounds (because of the required hardware) and can use only one of those two methods at a time (because the test cycle takes days)?

The pragmatic approach might be to deploy the nodes initially using Docker and then upgrade them with Erlang release handler (if you need to use Docker in the first place). Or, if your system doesn't need to be available during the upgrade (as the example PDF system does), you might just opt for always deploying new versions with Docker and forget about release handling. Or you may as well stick with release handler and forget about Docker if you need quick and reliable updates on-the-fly and Docker would be only used for the initial deployment. I hope that helps.

like image 94
Greg Avatar answered Nov 15 '22 05:11

Greg