I am trying to understand Stateful Sets. How does their use differ from the use of "stateless" Pods with Persistent Volumes? That is, assuming that a "normal" Pod may lay claim to persistent storage, what obvious thing am I missing that requires this new construct (with ordered start/stop and so on)?
Given this difference, Deployment is more suited to work with stateless applications. As far as a Deployment is concerned, Pods are interchangeable. While a StatefulSet keeps a unique identity for each Pod it manages. It uses the same identity whenever it needs to reschedule those Pods.
Deployments are used for stateless applications, StatefulSets for stateful applications. The pods in a deployment are interchangeable, whereas the pods in a StatefulSet are not. Deployments require a service to enable interaction with pods, while a headless service handles the pods' network ID in StatefulSets.
For a StatefulSet to work, it needs a Headless Service. A Headless Service does not have an IP address. Internally, it creates the necessary endpoints to expose pods with DNS names. The StatefulSet definition includes a reference to the Headless Service, but you have to create it separately.
StatefulSet is the workload API object used to manage stateful applications. Manages the deployment and scaling of a set of Pods, and provides guarantees about the ordering and uniqueness of these Pods. Like a Deployment, a StatefulSet manages Pods that are based on an identical container spec.
Yes, a regular pod can use a persistent volume. However, sometimes you have multiple pods that logically form a "group". Examples of this would be database replicas, ZooKeeper hosts, Kafka nodes, etc. In all of these cases there's a bunch of servers and they work together and talk to each other. What's special about them is that each individual in the group has an identity. For example, for a database cluster one is the master and two are followers and each of the followers communicates with the master letting it know what it has and has not synced. So the followers know that "db-x-0" is the master and the master knows that "db-x-2" is a follower and has all the data up to a certain point but still needs data beyond that.
In such situations you need a few things you can't easily get from a regular pod:
StatefulSets solve these issues because they provide (quoting from https://kubernetes.io/docs/concepts/workloads/controllers/statefulset/):
I didn't really talk about (3) and (4) but that can also help with clusters as you can tell the first one to deploy to become the master and the next one find the first and treat it as master, etc.
As some have noted, you can indeed can some of the same benefits by using regular pods and services, but its much more work. For example, if you wanted 3 database instances you could manually create 3 deployments and 3 services. Note that you must manually create 3 deployments as you can't have a service point to a single pod in a deployment. Then, to scale up you'd manually create another deployment and another service. This does work and was somewhat common practice before PetSet/PersistentSet came along. Note that it is missing some of the benefits listed above (persistent volume mapping & fixed start order for example).
1: Why StatefulSets?
Stateless app: Usually, frontend components have completely different scaling requirements than the backends, so we tend to scale them individually. Not to mention the fact that backends such as databases are usually much harder to scale compared to (stateless) frontend web servers. Yes, the term “stateless” means that no past data nor state is stored or needs to be persistent when a new container is created
Stateful app: Stateful applications typically involve some database, such as Cassandra, MongoDB, or MySQL and processes a read and/or write to it.
2: Can't a stateless Pod use persistent volumes?
Basically, there are few ways by which you can do it. However, it has its own disadvantages.
1: USING ONE REPLICASET PER POD INSTANCE
Although this takes care of the automatic rescheduling in case of node failures or accidental pod deletions, it’s much more cumbersome compared to having a single ReplicaSet.
For example, think about how you’d scale the pods in that case. You couldn’t change the desired replica count you’d have to create additional ReplicaSets instead. Using multiple ReplicaSets is therefore not the best solution.
2: USING MULTIPLE DIRECTORIES IN THE SAME VOLUME
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With