I'm trying to get a good understanding of container technologies but am somewhat confused. It seems like certain technologies overlap different portions of the stack and different pieces of different technologies can be used as the DevOps team sees fit (e.g., can use Docker containers but don't have to use the Docker engine, could use engine from cloud provider instead). My confusion lies in understanding what each layer of the "Container Stack" provides and who the key providers are of each solution. Here's my layman's understanding; would appreciate any corrections and feedback on holes in my understanding <ol> <li>Containers: self-contained package including application, runtime environment, system libraries, etc.; like a mini-OS with an application <ul> <li>It seems like Docker is the de-facto standard. Any others that are notable and widely used?</li> </ul> </li> <li>Container Clusters: groups of containers that share resources</li> <li>Container Engine: groups containers into clusters, manages resources</li> <li>Orchestrator: is this any different from a container engine? How? <ul> <li>Where do Docker Engine, rkt, Kubernetes, Google Container Engine, AWS Container Service, etc. fall between #s 2-4?</li> </ul> </li> </ol>

This may be a bit long and present some oversimplification but should be sufficient to get the idea across. <h3>Physical machines</h3> Some time ago, the best way to deploy simple applications was to simply buy a new webserver, install your favorite operating system on it, and run your applications there. <img src="https://i.stack.imgur.com/OyEgq.png" alt="Traditional model"> The cons of this model are: <ul> <li>The processes may interfere with each other (because they share CPU and file system resources), and one may affect the other's performance. </li> <li>Scaling this system up/down is difficult as well, taking a lot of effort and time in setting up a new physical machine. </li> <li>There may be differences in the hardware specifications, OS/kernel versions and software package versions of the physical machines, which make it difficult to manage these application instances in a hardware-agnostic manner.</li> </ul> Applications, being directly affected by the physical machine specifications, may need specific tweaking, recompilation, etc, which means that the cluster administrator needs to think of them as instances at an individual machine level. Hence, this approach does not scale. These properties make it undesirable for deploying modern production applications. <h3>Virtual Machines</h3> Virtual machines solve some of the problems of the above: <ul> <li>They provide isolation even while running on the same machine.</li> <li>They provide a standard execution environment (the guest OS) irrespective of the underlying hardware.</li> <li>They can be brought up on a different machine (replicated) quite quickly when scaling (order of minutes).</li> <li>Applications typically do not need to be rearchitected for moving from physical hardware to virtual machines.</li> </ul> <img src="https://i.stack.imgur.com/b2Jrk.png" alt="vms"> But they introduce some problems of their own: <ul> <li>They consume large amounts of resources in running an entire instance of an operating system.</li> <li>They may not start/go down as fast as we want them to (order of seconds).</li> <li>Even with hardware assisted virtualization, application instances may see significant performance degradation over an application running directly on the host. (This may be an issue only for certain kinds of applications)</li> <li>Packaging and distributing VM images is not as simple as it could be. (This is not as much a drawback of the approach, as it is of the existing tooling for virtualization.)</li> </ul> <h3>Containers</h3> Then, somewhere along the line, cgroups (control groups) were added to the linux kernel. This feature lets us isolate processes in groups, decide what other processes and file system they can see, and perform resource accounting at the group level. Various container runtimes and engines came along which make the process of creating a "container", an environment within the OS, like a namespace which has limited visibility, resources, etc, very easy. Common examples of these include docker, rkt, runC, LXC, etc. <img src="https://i.stack.imgur.com/k0Ij7.png" alt="containers"> <img src="https://i.stack.imgur.com/LCpTS.png" alt="docker/rkt/..."> Docker, for example, includes a daemon which provides interactions like creating an "image", a reusable entity that can be launched into a container instantly. It also lets one manage individual containers in an intuitive way. The advantages of containers: <ul> <li>They are light-weight and run with very little overhead, as they do not have their own instance of the kernel/OS and are running on top of a single host OS.</li> <li>They offer some degree of isolation between the various containers and the ability to impose limits on various resources consumed by them (using the cgroup mechanism).</li> <li>The tooling around them has evolved rapidly to allow easy building of reusable units (images), repositories for storing image revisions (container registries) and so on, largely due to docker.</li> <li>It is encouraged that a single container run a single application process, in order to maintain and distribute it independently. The light-weight nature of a container make this preferable, and leads to faster development due to decoupling.</li> </ul> There are some cons as well: <ul> <li>The level of isolation provided is a less than that in case of VMs.</li> <li>They are easiest to use with stateless 12-factor applications being built afresh and a slight struggle if one tries to deploy legacy applications, clustered distributed databases and so on.</li> <li>They need orchestration and higher level primitives to be used effectively and at scale.</li> </ul> <h3>Container Orchestration</h3> When running applications in production, as the complexity grows, it tends to have many different components, some of which scale up/down as necessary, or may need to be scaled. The containers themselves do not solve all our problems. We need a system that solves problems associated with real large-scale applications such as: <ul> <li>Networking between containers</li> <li>Load balancing</li> <li>Managing storage attached to these containers</li> <li>Updating containers, scaling them, spreading them across nodes in a multi-node cluster and so on.</li> </ul> When we want to manage a cluster of containers, we use a container orchestration engine. Examples of these are Kubernetes, Mesos, Docker Swarm etc. They provide a host of functionality in addition to those listed above and the goal is to reduce the effort involved in dev-ops. <img src="https://i.stack.imgur.com/wabH1.png" alt="orchestration"> <hr> GKE (Google Container Engine) is hosted Kubernetes on Google Cloud Platform. It lets a user simply specify that they need an n-node kubernetes cluster and exposes the cluster itself as a managed instance. Kubernetes is open source and if one wanted to, one could also set it up on Google Compute Engine, a different cloud provider, or their own machines in their own data-center. ECS is a proprietary container management/orchestration system built and operated by Amazon and available as part of the AWS suite.

Container technologies: docker, rkt, orchestration, kubernetes, GKE and AWS Container Service

Tags:

I'm trying to get a good understanding of container technologies but am somewhat confused. It seems like certain technologies overlap different portions of the stack and different pieces of different technologies can be used as the DevOps team sees fit (e.g., can use Docker containers but don't have to use the Docker engine, could use engine from cloud provider instead). My confusion lies in understanding what each layer of the "Container Stack" provides and who the key providers are of each solution.

Here's my layman's understanding; would appreciate any corrections and feedback on holes in my understanding

Containers: self-contained package including application, runtime environment, system libraries, etc.; like a mini-OS with an application
- It seems like Docker is the de-facto standard. Any others that are notable and widely used?
Container Clusters: groups of containers that share resources
Container Engine: groups containers into clusters, manages resources
Orchestrator: is this any different from a container engine? How?
- Where do Docker Engine, rkt, Kubernetes, Google Container Engine, AWS Container Service, etc. fall between #s 2-4?

919

asked Oct 20 '16 21:10

JL1680

1 Answers

This may be a bit long and present some oversimplification but should be sufficient to get the idea across.

Physical machines

Some time ago, the best way to deploy simple applications was to simply buy a new webserver, install your favorite operating system on it, and run your applications there.

Traditional model

The cons of this model are:

The processes may interfere with each other (because they share CPU and file system resources), and one may affect the other's performance.
Scaling this system up/down is difficult as well, taking a lot of effort and time in setting up a new physical machine.
There may be differences in the hardware specifications, OS/kernel versions and software package versions of the physical machines, which make it difficult to manage these application instances in a hardware-agnostic manner.

Applications, being directly affected by the physical machine specifications, may need specific tweaking, recompilation, etc, which means that the cluster administrator needs to think of them as instances at an individual machine level. Hence, this approach does not scale. These properties make it undesirable for deploying modern production applications.

Virtual Machines

Virtual machines solve some of the problems of the above:

They provide isolation even while running on the same machine.
They provide a standard execution environment (the guest OS) irrespective of the underlying hardware.
They can be brought up on a different machine (replicated) quite quickly when scaling (order of minutes).
Applications typically do not need to be rearchitected for moving from physical hardware to virtual machines.

vms

But they introduce some problems of their own:

They consume large amounts of resources in running an entire instance of an operating system.
They may not start/go down as fast as we want them to (order of seconds).
Even with hardware assisted virtualization, application instances may see significant performance degradation over an application running directly on the host. (This may be an issue only for certain kinds of applications)
Packaging and distributing VM images is not as simple as it could be. (This is not as much a drawback of the approach, as it is of the existing tooling for virtualization.)

Containers

Then, somewhere along the line, cgroups (control groups) were added to the linux kernel. This feature lets us isolate processes in groups, decide what other processes and file system they can see, and perform resource accounting at the group level.

Various container runtimes and engines came along which make the process of creating a "container", an environment within the OS, like a namespace which has limited visibility, resources, etc, very easy. Common examples of these include docker, rkt, runC, LXC, etc.

containers

docker/rkt/...

Docker, for example, includes a daemon which provides interactions like creating an "image", a reusable entity that can be launched into a container instantly. It also lets one manage individual containers in an intuitive way.

The advantages of containers:

They are light-weight and run with very little overhead, as they do not have their own instance of the kernel/OS and are running on top of a single host OS.
They offer some degree of isolation between the various containers and the ability to impose limits on various resources consumed by them (using the cgroup mechanism).
The tooling around them has evolved rapidly to allow easy building of reusable units (images), repositories for storing image revisions (container registries) and so on, largely due to docker.
It is encouraged that a single container run a single application process, in order to maintain and distribute it independently. The light-weight nature of a container make this preferable, and leads to faster development due to decoupling.

There are some cons as well:

The level of isolation provided is a less than that in case of VMs.
They are easiest to use with stateless 12-factor applications being built afresh and a slight struggle if one tries to deploy legacy applications, clustered distributed databases and so on.
They need orchestration and higher level primitives to be used effectively and at scale.

Container Orchestration

When running applications in production, as the complexity grows, it tends to have many different components, some of which scale up/down as necessary, or may need to be scaled. The containers themselves do not solve all our problems. We need a system that solves problems associated with real large-scale applications such as:

Networking between containers
Load balancing
Managing storage attached to these containers
Updating containers, scaling them, spreading them across nodes in a multi-node cluster and so on.

When we want to manage a cluster of containers, we use a container orchestration engine. Examples of these are Kubernetes, Mesos, Docker Swarm etc. They provide a host of functionality in addition to those listed above and the goal is to reduce the effort involved in dev-ops.

orchestration

GKE (Google Container Engine) is hosted Kubernetes on Google Cloud Platform. It lets a user simply specify that they need an n-node kubernetes cluster and exposes the cluster itself as a managed instance. Kubernetes is open source and if one wanted to, one could also set it up on Google Compute Engine, a different cloud provider, or their own machines in their own data-center.

ECS is a proprietary container management/orchestration system built and operated by Amazon and available as part of the AWS suite.

174

answered Sep 29 '22 11:09

Anirudh Ramanathan

Related questions
                            
                                Simplest way to extract date from timestamp
                            
                                mongoose TypeError: Schema is not a constructor
                            
                                Add glowing effect to an SKSpriteNode
                            
                                CoordinatorLayout custom behavior with AppBarLayout
                            
                                Email send through nodemailer goes into spam for gmail
                            
                                Inline eslint comment in JSX
                            
                                Read Excel file from a URL using the readxl package
                            
                                How to externalize application.properties in Tomcat webserver for Spring?
                            
                                How to install haskell 8 on ubuntu 16.04
                            
                                Accessing webpack bundled libraries in the browser
                            
                                ReactTestUtils has been moved
                            
                                String as Primary Key in Laravel migration

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With