Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why StatefulSets? Can't a stateless Pod use persistent volumes?

I am trying to understand Stateful Sets. How does their use differ from the use of "stateless" Pods with Persistent Volumes? That is, assuming that a "normal" Pod may lay claim to persistent storage, what obvious thing am I missing that requires this new construct (with ordered start/stop and so on)?

like image 893
Laird Nelson Avatar asked Jan 19 '17 02:01

Laird Nelson


People also ask

How does StatefulSets use differ from the use of stateless pods with persistent volumes?

Given this difference, Deployment is more suited to work with stateless applications. As far as a Deployment is concerned, Pods are interchangeable. While a StatefulSet keeps a unique identity for each Pod it manages. It uses the same identity whenever it needs to reschedule those Pods.

What's the difference between Kubernetes deployments and StatefulSets?

Deployments are used for stateless applications, StatefulSets for stateful applications. The pods in a deployment are interchangeable, whereas the pods in a StatefulSet are not. Deployments require a service to enable interaction with pods, while a headless service handles the pods' network ID in StatefulSets.

Why do StatefulSets use headless services?

For a StatefulSet to work, it needs a Headless Service. A Headless Service does not have an IP address. Internally, it creates the necessary endpoints to expose pods with DNS names. The StatefulSet definition includes a reference to the Headless Service, but you have to create it separately.

Why do we need StatefulSet in Kubernetes?

StatefulSet is the workload API object used to manage stateful applications. Manages the deployment and scaling of a set of Pods, and provides guarantees about the ordering and uniqueness of these Pods. Like a Deployment, a StatefulSet manages Pods that are based on an identical container spec.


2 Answers

Yes, a regular pod can use a persistent volume. However, sometimes you have multiple pods that logically form a "group". Examples of this would be database replicas, ZooKeeper hosts, Kafka nodes, etc. In all of these cases there's a bunch of servers and they work together and talk to each other. What's special about them is that each individual in the group has an identity. For example, for a database cluster one is the master and two are followers and each of the followers communicates with the master letting it know what it has and has not synced. So the followers know that "db-x-0" is the master and the master knows that "db-x-2" is a follower and has all the data up to a certain point but still needs data beyond that.

In such situations you need a few things you can't easily get from a regular pod:

  1. A predictable name: you want to start your pods telling them where to find each other so they can form a cluster, elect a leader, etc. but you need to know their names in advance to do that. Normal pod names are random so you can't know them in advance.
  2. A stable address/DNS name: you want whatever names were available in step (1) to stay the same. If a normal pod restarts (you redeploy, the host where it was running dies, etc.) on another host it'll get a new name and a new IP address.
  3. A persistent link between an individual in the group and their persistent volume: if the host where one of your database master was running dies it'll get moved to a new host but should connect to the same persistent volume as there's one and only 1 volume that contains the right data for that "individual". So, for example, if you redeploy your group of 3 database hosts you want the same individual (by DNS name and IP address) to get the same persistent volume so the master is still the master and still has the same data, replica1 gets it's data, etc.

StatefulSets solve these issues because they provide (quoting from https://kubernetes.io/docs/concepts/workloads/controllers/statefulset/):

  1. Stable, unique network identifiers.
  2. Stable, persistent storage.
  3. Ordered, graceful deployment and scaling.
  4. Ordered, graceful deletion and termination.

I didn't really talk about (3) and (4) but that can also help with clusters as you can tell the first one to deploy to become the master and the next one find the first and treat it as master, etc.

As some have noted, you can indeed can some of the same benefits by using regular pods and services, but its much more work. For example, if you wanted 3 database instances you could manually create 3 deployments and 3 services. Note that you must manually create 3 deployments as you can't have a service point to a single pod in a deployment. Then, to scale up you'd manually create another deployment and another service. This does work and was somewhat common practice before PetSet/PersistentSet came along. Note that it is missing some of the benefits listed above (persistent volume mapping & fixed start order for example).

like image 166
Oliver Dain Avatar answered Sep 21 '22 05:09

Oliver Dain


1: Why StatefulSets?

Stateless app: Usually, frontend components have completely different scaling requirements than the backends, so we tend to scale them individually. Not to mention the fact that backends such as databases are usually much harder to scale compared to (stateless) frontend web servers. Yes, the term “stateless” means that no past data nor state is stored or needs to be persistent when a new container is created

Stateful app: Stateful applications typically involve some database, such as Cassandra, MongoDB, or MySQL and processes a read and/or write to it.

2: Can't a stateless Pod use persistent volumes?

Basically, there are few ways by which you can do it. However, it has its own disadvantages.

1: USING ONE REPLICASET PER POD INSTANCE

  • you could create multiple ReplicaSets—one for each pod with each ReplicaSet’s desired replica count set to one, and each ReplicaSet’s pod template referencing a dedicated PersistentVolumeClaim.

enter image description here

  • Although this takes care of the automatic rescheduling in case of node failures or accidental pod deletions, it’s much more cumbersome compared to having a single ReplicaSet.

  • For example, think about how you’d scale the pods in that case. You couldn’t change the desired replica count you’d have to create additional ReplicaSets instead. Using multiple ReplicaSets is therefore not the best solution.

2: USING MULTIPLE DIRECTORIES IN THE SAME VOLUME

  • A trick you can use is to have all pods use the same PersistentVolume, but then have a separate file directory inside that volume for each pod Because you can’t configure pod replicas differently from a single pod template, you can’t tell each instance what directory it should use, but you can make each instance automatically select (and possibly also create) a data directory that isn’t being used by any other instance at that time.

enter image description here

  • This solution does require coordination between the instances, and isn’t easy to do correctly. It also makes the shared storage volume the bottleneck.

That's why one should encourage to use statefulsets

like image 31
Gupta Avatar answered Sep 21 '22 05:09

Gupta