Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Terraform: wait till the instance is "reachable"

I have some Terraform code with an aws_instance and a null_resource:

resource "aws_instance" "example" {
  ami           = data.aws_ami.server.id
  instance_type = "t2.medium"
  key_name      = aws_key_pair.deployer.key_name

  tags = {
    name = "example"
  }

  vpc_security_group_ids = [aws_security_group.main.id]
}

resource "null_resource" "example" {
  provisioner "local-exec" {
    command = "ANSIBLE_HOST_KEY_CHECKING=False ansible-playbook -T 300 -i ${aws_instance.example.public_dns}, --user centos --private-key files/id_rsa playbook.yml"
  }
}

It kinda works, but sometimes there is a bug (probably when the instance in a pending state). When I rerun Terraform - it works as expected.

Question: How can I run local-exec only when the instance is running and accepting an SSH connection?

like image 434
kharandziuk Avatar asked Jun 16 '20 07:06

kharandziuk


2 Answers

The null_resource is currently only going to wait until the aws_instance resource has completed which in turn only waits until the AWS API returns that it is in the Running state. There's a long gap from there to the instance starting the OS and then being able to accept SSH connections before your local-exec provisioner can connect.

One way to handle this is to use the remote-exec provisioner on the instance first as that has the ability to wait for the instance to be ready. Changing your existing code to handle this would look like this:

resource "aws_instance" "example" {
  ami           = data.aws_ami.server.id
  instance_type = "t2.medium"
  key_name      = aws_key_pair.deployer.key_name

  tags = {
    name = "example"
  }

  vpc_security_group_ids = [aws_security_group.main.id]


}

resource "null_resource" "example" {
  provisioner "remote-exec" {
    connection {
      host = aws_instance.example.public_dns
      user = "centos"
      file = file("files/id_rsa")
    }

    inline = ["echo 'connected!'"]
  }

  provisioner "local-exec" {
    command = "ANSIBLE_HOST_KEY_CHECKING=False ansible-playbook -T 300 -i ${aws_instance.example.public_dns},  --user centos --private-key files/id_rsa playbook.yml"
  }
}

This will first attempt to connect to the instance's public DNS address as the centos user with the files/id_rsa private key. Once it is connected it will then run echo 'connected!' as a simple command before moving on to your existing local-exec provisioner that runs Ansible against the instance.

Note that just being able to connect over SSH may not actually be enough for you to then provision the instance. If your Ansible script tries to interact with your package manager then you may find that it is locked from the instance's user data script running. If this is the case you will need to remotely execute a script that waits for cloud-init to be complete first. An example script looks like this:

#!/bin/bash

while [ ! -f /var/lib/cloud/instance/boot-finished ]; do
  echo -e "\033[1;36mWaiting for cloud-init..."
  sleep 1
done
like image 179
ydaetskcoR Avatar answered Nov 19 '22 05:11

ydaetskcoR


There is an ansible specific solution for this problem. Add this code to you playbook(there is all so pre_task clause if you use roles)

- name: will wait till reachable
  hosts: all
  gather_facts: no # important
  tasks:
    - name: Wait for system to become reachable
      wait_for_connection:

    - name: Gather facts for the first time
      setup:
like image 3
kharandziuk Avatar answered Nov 19 '22 05:11

kharandziuk