Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to prevent downtime in App Engine Flex when instances are automatically restarted

Situation

  • custom runtime (Docker/Node) on App Engine Flex
  • manually scaled to 1 single instance as we manage the resources ourselves (2 cpu / 6 gb ram)
  • liveness and readiness checks are configured
  • as expected, vm instances are automatically restarted on a weekly basis to apply OS / system updates
  • this is visible in the Activity pane of the Google Cloud Console
  • Stackdriver logs confirm this activity (e.g. shutdown-script: INFO Starting shutdown scripts. and startup-script: INFO Starting startup scripts.)
  • no instance is available during these restarts, resulting in 503 errors when visiting the application running on the instance

Goal

  • to have some control on the amount of instances to prevent downtime
  • e.g. temporarily scale to 2 instances while 1 instance is restarting
  • keeping control of the available resources (cpu / ram)

Question

We've considered simply having 2 instances available at all times, but are worried both would be restarted at the same time since they are part of the same instance group.

What would allow us to keep everything up and running while still controlling the amount of instances / resources used?

like image 370
Adriaan Meuris Avatar asked Jul 16 '19 10:07

Adriaan Meuris


2 Answers

I have a flex app with two instances running for similar reasons. For me, an instance will occasionally exceed memory limits and need to be restarted. Since I have a second instance, there should always be an instance available.

I hadn't considered the Google updates to my instances. I just checked my recent history, and Google restarted my two instances yesterday. The restarts were 7 minutes apart so, at least in this example, my users always had an instance available to them.

I suspect that Google does not simultaneously restart all of your instances. This would create a brief period of downtime for all flex customers, and nobody wants downtime for a cloud service.

UPDATE:

This is a guess, but I expect that when Google updates a flex instance, it will create a new instance and only shutdown the old instance after the new instance is available. At least, if I were running Google, that is how I would do it. That way you have 100% uptime and you will very briefly have an extra instance running. This would even work with a single flex instance.

like image 70
gaefan Avatar answered Oct 13 '22 02:10

gaefan


Maybe you should try Automatic scaling showed here: Scaling instances.

This allows your application to automatically create instances based on request rate, response latencies, and other application metrics. When one of your instances are gets shut down, another instance could be created in order to "cover" the missing instance. Thus, your service won't get interrupted.

like image 25
Kevin Quinzel Avatar answered Oct 13 '22 02:10

Kevin Quinzel