Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Zero downtime deployment of Slack bot

We develop bot with BotKit and now we try to solve problem with minimal deployment downtime.

There are the server and docker container running on this server. Inside container run bot-app instance connected with RTM-server (Slack). When I start to deploy new version (v2) of bot-app, I want to get zero downtime, users should not see "bot is offline".

timeline

Deploy script runs second docker container with a new version of bot-app. And bot-app connect to RTM-server too. In this way, there are few seconds, when both apps run, connected to RTM-server and responds to user commands (and a user will to see two answers to his command).

What optimal decision I can get if on the one hand we want to get zero downtime and on the other hand, we want to prevent the user interact with the two instances at the same time?

Decision 1: To allow small chance the likelihood of a collision, when both instances will respond to the user command.

Decision 2: Abandon the zero downtime deployment. In this case, deploy script first stops the first docker-container, then start another one. The app will not respond to user commands, sent between stopping current version of the app and fully starting of a new version of an app.

Decision 3: With an interact of parallel run current and new version of app or mutexes. General schematic: 1) Current version of app is running 2) Deploy script starts new version of app 3) I time when a new version of app almost run and ready to connect to RTM-server, it send to current version app command to close RTM-connection. 4) Current version of app closes RTM-connection 5) New version of app open RTM-connection

I think there are other good solutions.

How would you have solved this problem in your application?

like image 407
vovan Avatar asked Apr 16 '16 20:04

vovan


People also ask

What is zero downtime deployment?

Zero downtime deployment is a deployment method where your website or application is never down or in an unstable state during the deployment process. To achieve this the web server doesn't start serving the changed code until the entire deployment process is complete.

Where do I deploy slack bots?

Deployment notifications For any repository in your DeployBot account, navigate to the Settings > Integrations page from the menu at the top of the page. Click Connect next to the Slack icon. You will then need to grant DeployBot limited access to your Slack account. Click the Authorize button to grant this access.


1 Answers

(Sorry for the second reply; had another idea.)

The approach I described earlier would be pretty disruptive to your existing code, since you'd probably need to stop using botkit (or at least not use it to do the RTM API communication). An approach that may be less disruptive would be to use some sort of external way to signal that a given message is already been processed.

For example, using Redis, have the bot do the following command when a message comes in:

SET message:<message timestamp> 1 NX PX 30000

The NX option means this command will only succeed if the key doesn't already exist. So the first instance of the bot that manages to execute this will succeed, and the other instance will fail. The bot should only process the message and respond if this command succeeded.

(The PX 30000 sets a 30-second expiration so Redis doesn't get full of these keys.)

This should let you do your zero-downtime upgrades via overlapping the running bot instances without having to worry about a message being processed twice.

Note that it's still possible in this scheme for a message to be dropped altogether if a bot is shut down in a non-graceful way. (It could die just after calling the SET command but before it's actually dealt with the message.) A real queue with a two-phase "get/delete" would be better, but then you're back to my other answer. :-)

like image 102
user94559 Avatar answered Oct 20 '22 22:10

user94559