Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Terraform stucks when instance_count is more than 2 while using remote-exec provisioner

  • I am trying to provision multiple Windows EC2 instance with Terraform's remote-exec provisioner using null_resource.

$ terraform -v Terraform v0.12.6 provider.aws v2.23.0 provider.null v2.1.2

  • Originally, I was working with three remote-exec provisioners (Two of them involved rebooting the instance) without null_resource and for a single instance, everything worked absolutely fine.
  • I then needed to increase the count and based on several links, ended up using null_resource. So, I have reduced the issue to the point where I am not even able to run one remote-exec provisioner for more than 2 Windows EC2 instances using null_resource.

Terraform template to reproduce the error message:

//VARIABLES

variable "aws_access_key" {
  default = "AK"
}
variable "aws_secret_key" {
  default = "SAK"
}
variable "instance_count" {
  default = "3"
}
variable "username" {
  default = "Administrator"
}
variable "admin_password" {
  default = "Password"
}
variable "instance_name" {
  default = "Testing"
}
variable "vpc_id" {
  default = "vpc-id"
}

//PROVIDERS
provider "aws" {
  access_key = "${var.aws_access_key}"
  secret_key = "${var.aws_secret_key}"
  region     = "ap-southeast-2"
}

//RESOURCES
resource "aws_instance" "ec2instance" {
  count         = "${var.instance_count}"
  ami           = "Windows AMI"
  instance_type = "t2.xlarge"
  key_name      = "ec2_key"
  subnet_id     = "subnet-id"
  vpc_security_group_ids = ["${aws_security_group.ec2instance-sg.id}"]
  tags = {
    Name = "${var.instance_name}-${count.index}"
  }
}

resource "null_resource" "nullresource" {
  count = "${var.instance_count}"
  connection {
    type     = "winrm"
    host     = "${element(aws_instance.ec2instance.*.private_ip, count.index)}"
    user     = "${var.username}"
    password = "${var.admin_password}"
    timeout  = "10m"
  }
   provisioner "remote-exec" {
     inline = [
       "powershell.exe Write-Host Instance_No=${count.index}"
     ]
   }
//   provisioner "local-exec" {
//     command = "powershell.exe Write-Host Instance_No=${count.index}"
//   }
//   provisioner "file" {
//       source      = "testscript"
//       destination = "D:/testscript"
//   }
}
resource "aws_security_group" "ec2instance-sg" {
  name        = "${var.instance_name}-sg"
  vpc_id      = "${var.vpc_id}"


//   RDP
  ingress {
    from_port   = 3389
    to_port     = 3389
    protocol    = "tcp"
    cidr_blocks = ["CIDR"]
    }

//   WinRM access from the machine running TF to the instance
  ingress {
    from_port   = 5985
    to_port     = 5985
    protocol    = "tcp"
    cidr_blocks = ["CIDR"]
    }

  tags = {
    Name        = "${var.instance_name}-sg"
  }

}
//OUTPUTS
output "private_ip" {
  value = "${aws_instance.ec2instance.*.private_ip}"
}

Observations:

  • With one remote-exec provisioner, it works fine if count is set to 1 or 2. With count 3, it's unpredictable that all the provisioners will run everytime on all the instances. However one thing is for sure that Terraform never completes and does not show the output variables. It keeps showing "null_resource.nullresource[count.index]: Still creating..."
  • For the local-exec provisioner - Everything works fine. Tested with count's value as 1, 2 and 7.
  • For file provisioner its working fine for 1, 2 and 3 however does not finish for 7 but the file was copied on all the 7 instances. It keeps showing "null_resource.nullresource[count.index]: Still creating..."
  • Also, in every attempt, remote-exec provisioner is able to connect to the instances irrespective of count's value and it's just that, it's doesnt trigger the inline command and randomly chooses to skip that and starts showing "Still creating..." message.
  • I have been stuck with this issue for quite some time now. Couldnt find anything significant in debug logs as well. I know Terraform is not recommended to be used as a config mgmt tool however, everything's working fine even with complex provisioning scripts if the instance count is just 1 (Even without null_resource) which indicates that it should be easily possible for Terraform to handle such a basic provisioning requirement.
  • TF_DEBUG logs:
  • count=2, TF completes successfully and shows Apply complete!.
  • count=3, TF runs the remote-exec on all the three instances however does not complete and doesn't not show the outputs variables. Stuck at "Still creating..."
  • count=3, TF runs the remote-exec only on two instances and skips on nullresource[1] , does not complete and doesn't not show the outputs variables. Stuck at "Still creating..."
  • Any pointers will be greatly appreciated!
like image 484
st_rt_dl_8 Avatar asked Oct 22 '25 03:10

st_rt_dl_8


2 Answers

Update: what eventually did the trick was downgrading Terraform to v11.14 as per this issue comment.

A few things you can try:

  1. Inline remote-exec:
resource "aws_instance" "ec2instance" {
  count         = "${var.instance_count}"
  # ...
  provisioner "remote-exec" {
    connection {
      # ...
    }
    inline = [
      # ...
    ]
  }
}

Now you can refer to self inside the connection block to get the instance's private IP.

  1. Add triggers to null_resource:
resource "null_resource" "nullresource" {
  triggers {
    host    = "${element(aws_instance.ec2instance.*.private_ip, count.index)}" # Rerun when IP changes
    version = "${timestamp()}" # ...or rerun every time
  }
  # ...
}

You can use the triggers attribute to recreate null_resource and thus re-execute remote-exec.

like image 167
Aleksi Avatar answered Oct 23 '25 23:10

Aleksi


I used this triger in null_resource and it works perfectly for me. It also works when number of instances are increased and it do configuration on all instances.I am using terraform and openstack.

triggers= { instance_ids = join(",",openstack_compute_instance_v2.swarm-cluster-hosts[*].id) }

like image 44
Niaz Hussain Avatar answered Oct 23 '25 22:10

Niaz Hussain