Zero downtime deployment of Slack bot

Tags:

We develop bot with BotKit and now we try to solve problem with minimal deployment downtime.

There are the server and docker container running on this server. Inside container run bot-app instance connected with RTM-server (Slack). When I start to deploy new version (v2) of bot-app, I want to get zero downtime, users should not see "bot is offline".

timeline

Deploy script runs second docker container with a new version of bot-app. And bot-app connect to RTM-server too. In this way, there are few seconds, when both apps run, connected to RTM-server and responds to user commands (and a user will to see two answers to his command).

What optimal decision I can get if on the one hand we want to get zero downtime and on the other hand, we want to prevent the user interact with the two instances at the same time?

Decision 1: To allow small chance the likelihood of a collision, when both instances will respond to the user command.

Decision 2: Abandon the zero downtime deployment. In this case, deploy script first stops the first docker-container, then start another one. The app will not respond to user commands, sent between stopping current version of the app and fully starting of a new version of an app.

Decision 3: With an interact of parallel run current and new version of app or mutexes. General schematic: 1) Current version of app is running 2) Deploy script starts new version of app 3) I time when a new version of app almost run and ready to connect to RTM-server, it send to current version app command to close RTM-connection. 4) Current version of app closes RTM-connection 5) New version of app open RTM-connection

I think there are other good solutions.

How would you have solved this problem in your application?

407

asked Apr 16 '16 20:04

vovan

1 Answers

(Sorry for the second reply; had another idea.)

The approach I described earlier would be pretty disruptive to your existing code, since you'd probably need to stop using botkit (or at least not use it to do the RTM API communication). An approach that may be less disruptive would be to use some sort of external way to signal that a given message is already been processed.

For example, using Redis, have the bot do the following command when a message comes in:

SET message:<message timestamp> 1 NX PX 30000

The NX option means this command will only succeed if the key doesn't already exist. So the first instance of the bot that manages to execute this will succeed, and the other instance will fail. The bot should only process the message and respond if this command succeeded.

(The PX 30000 sets a 30-second expiration so Redis doesn't get full of these keys.)

This should let you do your zero-downtime upgrades via overlapping the running bot instances without having to worry about a message being processed twice.

Note that it's still possible in this scheme for a message to be dropped altogether if a bot is shut down in a non-graceful way. (It could die just after calling the SET command but before it's actually dealt with the message.) A real queue with a two-phase "get/delete" would be better, but then you're back to my other answer. :-)

102

answered Oct 20 '22 22:10

user94559

Related questions
                            
                                Best "official" scripting language for Windows programmers [closed]
                            
                                Check condition if the application is running for the first time after being installed
                            
                                rbenv: version `2.2.3' is not installed (set by RBENV_VERSION environment variable)
                            
                                Referencing the current server in Capistrano task
                            
                                Distributing ruby application as standalone in linux and windows
                            
                                Is it safe to run your site within a Git repository on your production server?
                            
                                pm2 creates a "source" directory and copies all my files inside, why?
                            
                                Glassfish 3.1: Cannot find 'View Endpoint' link after deploying a WebService
                            
                                SQL CLR Procedure Default Parameter in VS2008 deployment?
                            
                                Automatically deploying assets to Rackspace CDN via git and updating references to those assets?
                            
                                Deploy Rails Application on Bluehost
                            
                                Is the Hackage document building queue visible?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Zero downtime deployment of Slack bot

Tags:

parallel-processing

mutex

deployment

slack

devops

vovan

People also ask

1 Answers

user94559

Recent Activity

Donate For Us