I am working on an app using Google's compute engine and would like to use pre-emptible instances.
I need my code to respond to the 30s warning google gives via an ACPI G2 Soft Off signal that they send when they are going to take away your VM as described here: https://cloud.google.com/compute/docs/instances/preemptible.
How do I detect this event in my python code that is running on the machine and react to it accordingly (in my case I need to put the job the VM was working on back on a queue of open jobs so that a different machine can take it).
I am not answering the question directly, but I think that your actual intent is different:
gcloud instances stop
command (or the corresponding API, which it calls);GCE does not send a "30s termination warning" with the power button event. It just sends the normal, honest power button soft-off event that immediately initiates shutdown of the system.
The "warning" part that comes with it is simple: “Here is your power button event, shutdown the OS ASAP, because you have 30s before we pull the plug off the wall socket. You've been warned!”
You have two system services that you can combine in different ways to get the desired behavior.
The most kosher (and, AFAIK, the only supported) way of handling the ACPI power button event is let the system handle it, and execute what you want in the instance shutdown script. In a systemd-managed machine, the default GCP shutdown script is simply invoked by a Type=oneshot
service's ExecStop=
command (see systemd.service(8)). The script is ran relatively late in shutdown sequence.
If you must ensure that the shutdown script is ran after (or before) some of your services is sent a signal to terminate, you can modify some of service dependencies. Things to keep in mind:
After
and Before
are reversed on shutdown: if X is started after Y, then it's stopped before Y.After
dependency ensures that the service in the sequence is told to terminate before the shutdown script is run. It does not ensure that the service has already terminated.google-shutdown-scripts.service
is stopped as part of system shutdown.With all that in mind, you can do sudo systemctl edit google-shutdown-scripts.service
. This will create an empty configuration override file and open your $EDITOR
, where you can put your After
and Before
dependencies, for example,
[Unit]
# Make sure that shutdown script is run (synchronously) *before* mysvc1.service is stopped.
After=mysvc1.service
# Make sure that mysvc2.service is sent a command to stop before the shutdown script is run
Before=mysvc2.service
You may specify as many After or Before clauses as you want, 0 or more of each. Read systemd.unit(8) for more information.
There is an instance metadatum v1/instance/preempted
. If the instance is preempted, it's value is TRUE
, otherwise it's FALSE
.
GCP has a thorough documentation on working with instance metadata. In short, there are two ways you can use this (or any other) metadata value:
Query its value at any time, e. g. in the shutdown script. curl(1) equivalent:
curl -sfH 'Metadata-Flavor: Google' \
'http://169.254.169.254/computeMetadata/v1/instance/preempted'
Run an HTTP request that will complete (200) when the metadatum changes. The only change that can ever happen to it is from FALSE
to TRUE
, as preemption is irreversible.
curl -sfH 'Metadata-Flavor: Google' \
'http://169.254.169.254/computeMetadata/v1/instance/preempted?wait_for_change=true'
Caveat: The metadata server may return the 503 response if it's temporarily unavailable (this is very rare, but happens), so certain retry logic is required. This especially true for the long-running second form (with ?wait_for_change=true
), as the pending request may return at any time with the code 503. Your code should be ready to handle this and restart the query. curl does not return the HTTP error code directly, but you can use the fact that x=$(curl ....)
expression returned an empty string if you scripting it; your criterion for positive detection of preemption is [[ $x == TRUE ]]
in this case.
gcloud instance stop <vmname>
(which also sends the power button event!), query the preempted
metadata in the shutdown script.preempted
metadata from the code path which handles the termination signal, if you need to distinguish between different shutdown reasons.It is not impossible that the real decision point is whether you have an "active job" that you want to return to the "queue", or not: if your service is requested to stop while holding on an active job, just return it, regardless of the reason why you are being stopped. But I cannot comment on this, not knowing your actual design.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With