Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to implement the "One Binary" principle with Docker

The One Binary principle explained here: http://programmer.97things.oreilly.com/wiki/index.php/One_Binary states that one should...

"Build a single binary that you can identify and promote through all the stages in the release pipeline. Hold environment-specific details in the environment. This could mean, for example, keeping them in the component container, in a known file, or in the path."

I see many dev-ops engineers arguably violate this principle by creating one docker image per environment (ie, my-app-qa, my-app-prod and so on). I know that Docker favours immutable infrastructure which implies not changing an image after deployment, therefore not uploading or downloading configuration post deployment. Is there a trade-off between immutable infrastructure and the one binary principle or can they complement each-other? When it comes to separating configuration from code what is the best practice in a Docker world??? Which one of the following approaches should one take...

1) Creating a base binary image and then having a configuration Dockerfile that augments this image by adding environment specific configuration. (i.e my-app -> my-app-prod)

2) Deploying a binary-only docker image to the container and passing in the configuration through environment variables and so on at deploy time.

3) Uploading the configuration after deploying the Docker file to a container

4) Downloading configuration from a configuration management server from the running docker image inside the container.

5) Keeping the configuration in the host environment and making it available to the running Docker instance through a bind mount.

Is there another better approach not mentioned above?

How can one enforce the one binary principle using immutable infrastructure? Can it be done or is there a trade-off? What is the best practice??

like image 435
murungu Avatar asked May 11 '16 08:05

murungu


1 Answers

I've about 2 years of experience deploying Docker containers now, so I'm going to talk about what I've done and/or know to work.

So, let me first begin by saying that containers should definitely be immutable (I even mark mine as read-only).

Main approaches:

  • use configuration files by setting a static entrypoint and overriding the configuration file location by overriding the container startup command - that's less flexible, since one would have to commit the change and redeploy in order to enable it; not fit for passwords, secure tokens, etc
  • use configuration files by overriding their location with an environment variable - again, depends on having the configuration files prepped in advance; ; not fit for passwords, secure tokens, etc
  • use environment variables - that might need a change in the deployment code, thus lessening the time to get the config change live, since it doesn't need to go through the application build phase (in most cases), deploying such a change might be pretty easy. Here's an example - if deploying a containerised application to Marathon, changing an environment variable could potentially just start a new container from the last used container image (potentially on the same host even), which means that this could be done in mere seconds; not fit for passwords, secure tokens, etc, and especially so in Docker
  • store the configuration in a k/v store like Consul, make the application aware of that and let it be even dynamically reconfigurable. Great approach for launching features simultaneously - possibly even accross multiple services; if implemented with a solution such as HashiCorp Vault provides secure storage for sensitive information, you could even have ephemeral secrets (an example would be the PostgreSQL secret backend for Vault - https://www.vaultproject.io/docs/secrets/postgresql/index.html)
  • have an application or script create the configuration files before starting the main application - store the configuration in a k/v store like Consul, use something like consul-template in order to populate the app config; a bit more secure - since you're not carrying everything over through the whole pipeline as code
  • have an application or script populate the environment variables before starting the main application - an example for that would be envconsul; not fit for sensitive information - someone with access to the Docker API (either through the TCP or UNIX socket) would be able to read those
  • I've even had a situation in which we were populating variables into AWS' instance user_data and injecting them into container on startup (with a script that modifies containers' json config on startup)

The main things that I'd take into consideration:

  • what are the variables that I'm exposing and when and where am I getting their values from (could be the CD software, or something else) - for example you could publish the AWS RDS endpoint and credentials to instance's user_data, potentially even EC2 tags with some IAM instance profile magic
  • how many variables do we have to manage and how often do we change some of them - if we have a handful, we could probably just go with environment variables, or use environment variables for the most commonly changed ones and variables stored in a file for those that we change less often
  • and how fast do we want to see them changed - if it's a file, it typically takes more time to deploy it to production; if we're using environment variable s, we can usually deploy those changes much faster
  • how do we protect some of them - where do we inject them and how - for example Ansible Vault, HashiCorp Vault, keeping them in a separate repo, etc
  • how do we deploy - that could be a JSON config file sent to an deployment framework endpoint, Ansible, etc
  • what's the environment that we're having - is it realistic to have something like Consul as a config data store (Consul has 2 different kinds of agents - client and server)

I tend to prefer the most complex case of having them stored in a central place (k/v store, database) and have them changed dynamically, because I've encountered the following cases:

  • slow deployment pipelines - which makes it really slow to change a config file and have it deployed
  • having too many environment variables - this could really grow out of hand
  • having to turn on a feature flag across the whole fleet (consisting of tens of services) at once
  • an environment in which there is real strive to increase security by better handling sensitive config data

I've probably missed something, but I guess that should be enough of a trigger to think about what would be best for your environment

like image 112
iangelov Avatar answered Sep 24 '22 18:09

iangelov