In a nutshell we have a platform which comprises several applications/servers. Terraform is used to manage both the AWS Infrastructure (VPC, Subnet, IGW, Security Groups, ...) and Applications deployment (utilizing Ansible as provisioner from Terraform). For each deployment Packer will build all AMIs, tag them with appropriate name so from Terraform latest AMIs will be picked up.
The process in general works but we face a dilemma when we want to deploy some small hotfixes, that could happen quite frequently as after each deployment and testing from QA some regressions could happen. So for each application that needs to be hot-fixed (may be not all apps need to be fixed), we create a hotfix branch, build the artifact (could be jar or deb pkg) - then there're 2 cases:
With the first approach, we assure the Immutable Infra idea is followed, unfortunately it also caused some downsides as any small changes in Terraform configuration or Infra would case a change in terraform plan, for example we may have some changes in security group which is out of terraform state (i.e: it might be from some features regarding whitelisting some IPs), and applying tf would cancel all changes. The whole process of building AMI and run Terraform apply also quite heavy.
We're leaning more to the second approach, which is easy, but still wonder if it's a good practice?
For code changes, I recommend using packer to build AMI's as a part of your CI pipeline, it can definitely be cumbersome to manage launch config changes with Terraform and ASG's given how buggy it can be but I think the result is much cleaner and safer than updating code with Ansible. You do technically have a "record" of changes given that you know your ansible playbooks and what state they are in but I think it should be driven from a CI pipeline to build immutable artifacts.
If you really wanted to stick with just Ansible you can always just bake into your userdata an Ansible playbook that always pulls in the latest code from Master (or whatever). This ensures new hosts come up with latest code, and you can manually invoke Ansible against pre-existing hosts. Or you can just rotate ec2 instances to update code by doubling desired capacity and scaling back down once new are healthy. This can all be highly automated and would give you a pseudo canary deployment. Again though I'd recommend sticking with immutable builds.
Out of curiosity any reason you're not using docker? I'm sure you have a good business reason, but moving to docker simplifies a lot of this as well, as it's much much easier to build a docker container and update an ECS task definition, than deploy an entirely new AMI/EC2 Instance.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With