Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Zero downtime deployment for Java apps

I am trying to build the very lightweight solution for zero downtime deployment for Java apps. For the sake of simplicity lets think that we have two servers. My solution is to use:

  1. On the "front" -- some load balancer (software) - I am thinking about HAProxy here.

  2. On the "back" - two servers, both running Tomcat with deployed application.

When we are about to deploy new release

  1. We disable one of the servers with HAProxy, so only one server (let's call it server A, which is running old release) will be available.

  2. Deploy new release on other server (let's call it server B), run production unit tests (in case we have them :-) and enable server B with HAProxy, disabling server A at the same time.

  3. Now we have again only one server active (server B, with the new release). Deploy new release on server B, and re-enable it.

Any advises how to improve? How automate?

Any ready made solutions or do I have to end up with my own custom scripts?

Thanks!

like image 364
alexeypro Avatar asked Dec 16 '10 14:12

alexeypro


People also ask

How do I deploy an app with zero downtime?

A Blue-Green deployment is a relatively simple way to achieve zero downtime deployments by creating a new, separate environment for the new version being deployed and switching traffic into it. A rollback happens just as easily, with a traffic switch to the old version.

Which process helps us to deploy the application onto the cloud without taking downtime?

Blue-Green Deployments There's another style of deployment called Blue-Green. It is intended to remove any downtime, even during updates that require database migration. In this deployment setup there are two production environments.

How do you get zero downtime deployment in Azure?

To achieve zero downtime, Azure recommends using Deployment slots and swapping the staging and production slots. This is fine for a normal web applications, but If I have a web app where I am also doing other stuff like reading messages from queues, running workers in the background etc.


1 Answers

I have found some interesting solutions from this article regarding Zero downtime. I would like to highlight only few solutions in that article.

1. A/B switch: ( Rolling upgrade + Fallback mechanism )

We should have a set of nodes in standing by mode. We will deploy the new version to those nodes and switch the traffic to them instantly. If we keep the old nodes in their original state, we could do instant rollback as well. A load balancer fronts the application and is responsible for this switch upon request.

cons: If you need X servers to run your application, yon need 2X servers with this approach.

2. Zero downtime

With this approach, we don’t keep a set of machines; rather, we delay the port binding. Shared resource acquisition is delayed until the application starts up. The ports are switched after the application starts, and the old version is also kept running (without an access point) to roll back instantly if needed.

3. Parallel deployment – Apache Tomcat: ( For web applications only)

Apache Tomcat has added the parallel deployment feature to their version 7 release. They let two versions of the application run at the same time and take the latest version as default.

4. Delayed port binding:

we propose here is the ability to start the server without binding the port and essentially without starting the connector. Later, a separate command will start and bind the connector. Version 2 of the software can be deployed while version 1 is running and already bound. When version 2 is started later, we can unbind version 1 and bind version 2. With this approach, the node is effectively offline only for a few seconds.

5. Advanced port binding:

By breaking the myth: ‘Address already in use’, *both old process & new process will bind to same port. SO_REUSEPORT option in ON mode lets two (or more) processes bind to the same port. Once the new process binds to the port, kill the old process.

The SO_REUSEPORT option address two issues:

  1. The small glitch between the application version switching: The node can serve traffic all the time, effectively giving us zero downtime.

  2. Improved scheduling:

enter image description here

In Summary:

By combining both late binding and port reuse, we can effectively achieve zero downtime. And if we keep the standby process around, we will be able to do an instant rollback as well.

like image 88
Ravindra babu Avatar answered Oct 12 '22 14:10

Ravindra babu