Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why does terraform + apt-get fail, intermittently?

I'm using terraform to create mutiple ec2 nodes on aws:

resource "aws_instance" "myapp" {
    count = "${var.count}"
    ami = "${data.aws_ami.ubuntu.id}"
    instance_type = "m4.large"
    vpc_security_group_ids = ["${aws_security_group.myapp-security-group.id}"]
    subnet_id = "${var.subnet_id}"
    key_name = "${var.key_name}"
    iam_instance_profile = "${aws_iam_instance_profile.myapp_instance_profile.id}"

    connection {
        user = "ubuntu"
        private_key = "${file("${var.key_file_path}")}"
    }

    provisioner "remote-exec" {
        inline = [
            "sudo apt-get update",
            "sudo apt-get upgrade -y",
            "sudo apt-get install -f -y openjdk-7-jre-headless git awscli"
        ]
    }
}

When I run this with say count=4, some nodes intermittently fail with apt-get errors like:

aws_instance.myapp.1 (remote-exec): E: Unable to locate package awscli

while the other 3 nodes found awscli just fine. Now all nodes are created from the same AMI, use the exact same provisioning commands, why would only some of them fail? The variation could potentially come from:

  • Multiple copies of AMIs on amazon, which aren't identical
  • Multiple apt-get mirrors which aren't identical

Which is more likely? Any other possibilities I'm missing?
Is there an apt-get "force" type flag I can use that will make the provisioning more repeatable?

The whole point of automating provisioning through scripts is to avoid this kind of variation between nodes :/

like image 793
RaGe Avatar asked Oct 26 '25 08:10

RaGe


1 Answers

The remote-exec provisioner feature of Terraform just generates a shell script that is uploaded to the new instance and runs the commands you specify. Most likely you're actually running into problems with cloud-init which is configured to run on standard Ubuntu AMIs, and the provisioner is attempting to run while cloud-init is also running, so you're running into a timing/conflict.

You can make your script wait until after cloud-init has finished provisioning. cloud-init creates a file in /var/lib/cloud/instance/boot-finished, so you can put this inline with your provisioner:

until [[ -f /var/lib/cloud/instance/boot-finished ]]; do
  sleep 1
done

Alternatively, you can take advantage of cloud-init and have it install arbitrary packages for you. You can specify user-data for your instance like so in Terraform (modified from your snippet above):

resource "aws_instance" "myapp" {
    count = "${var.count}"
    ami = "${data.aws_ami.ubuntu.id}"
    instance_type = "m4.large"
    vpc_security_group_ids = ["${aws_security_group.myapp-security-group.id}"]
    subnet_id = "${var.subnet_id}"
    key_name = "${var.key_name}"
    iam_instance_profile = "${aws_iam_instance_profile.myapp_instance_profile.id}"

    user_data = "${data.template_cloudinit_config.config.rendered}"
}

# Standard cloud-init stuff
data "template_cloudinit_config" "config" {
    # I've 
    gzip = false
    base64_encode = false

    part {
        content_type = "text/cloud-config"
        content = <<EOF
packages:
    - awscli
    - git
    - openjdk-7-headless
EOF
    }
}
like image 182
逆さま Avatar answered Oct 28 '25 23:10

逆さま