Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

k8s - Significance of ReplicaSet matchLabel selector

Tags:

kubernetes

Assuming deployment, replicaSet and pod are all 1:1:1 mapping.

deployment ==> replicaSet ==> Pod

When we do deployment, replicaSet adds pod-template-hash label to pods. So, this looks enough for a replicaSet to check if enough pods are running. Then what is the significance of replicaSet matchLabels selector? Why is it mandatory?

To explain for better understanding

For ex: I deploy an app with these labels. 2 pods are running

spec:
  replicas: 2
  selector:
    matchLabels:
      app: nginx-app

Now change label value of pod-template-hash to something else for one of the pods (changing to testing here). Now we immediately see another pod started. So replicaSet does not seem to care about selector.matchLabels

NAME                            READY   STATUS    RESTARTS   AGE   LABELS
pod/nginx-app-b8b875889-cpnnr   1/1     Running   0          53s   app=nginx-app,pod-template-hash=testing
pod/nginx-app-b8b875889-jlk6m   1/1     Running   0          53s   app=nginx-app,pod-template-hash=b8b875889
pod/nginx-app-b8b875889-xblqr   1/1     Running   0          11s   app=nginx-app,pod-template-hash=b8b875889

NAME                 TYPE        CLUSTER-IP   EXTERNAL-IP   PORT(S)   AGE    LABELS
service/kubernetes   ClusterIP   10.96.0.1    <none>        443/TCP   151d   component=apiserver,provider=kubernetes

NAME                        READY   UP-TO-DATE   AVAILABLE   AGE   LABELS
deployment.apps/nginx-app   2/2     2            2           53s   app=nginx-app

NAME                                  DESIRED   CURRENT   READY   AGE   LABELS
replicaset.apps/nginx-app-b8b875889   2         2         2       53s   app=nginx-app,pod-template-hash=b8b875889
like image 901
RamPrakash Avatar asked Dec 21 '20 17:12

RamPrakash


People also ask

What is the purpose of a ReplicaSet in k8?

A ReplicaSet's purpose is to maintain a stable set of replica Pods running at any given time. As such, it is often used to guarantee the availability of a specified number of identical Pods.

What is a ReplicaSet and why is it useful?

A ReplicaSet is a process that runs multiple instances of a Pod and keeps the specified number of Pods constant. Its purpose is to maintain the specified number of Pod instances running in a cluster at any given time to prevent users from losing access to their application when a Pod fails or is inaccessible.

Why selector is used in Kubernetes?

Kubernetes selector allows us to select Kubernetes resources based on the value of labels and resource fields assigned to a group of pods or nodes.

What is DaemonSet and ReplicaSet in Kubernetes?

In this way, ReplicaSet ensures that the number of pods of an application is running on the correct scale as specified in the conf file. Whereas in the case of DaemonSet it will ensure that one copy of pod defined in our configuration will always be available on every worker node.


Video Answer


2 Answers

Let me summarize it. The whole discussion is about: Why deployment forces me to set matchLabels selector even though it could easly live without it, since its adding pod-template-hash and it would be totally fine with using only that.

After reading all the comments and all the discussion I decided to look in kubernetes documentation.

I will allow myself to quote k8s documentation about replicasets: How a ReplicaSet works

How a ReplicaSet works:

[...]

A ReplicaSet is linked to its Pods via the Pods' metadata.ownerReferences field, which specifies what resource the current object is owned by. All Pods acquired by a ReplicaSet have their owning ReplicaSet's identifying information within their ownerReferences field. It's through this link that the ReplicaSet knows of the state of the Pods it is maintaining and plans accordingly.

So does is mean that it's not using labels at all? Well, not exactly. Let's keep reading:

A ReplicaSet identifies new Pods to acquire by using its selector. If there is a Pod that has no OwnerReference or the OwnerReference is not a Controller and it matches a ReplicaSet's selector, it will be immediately acquired by said ReplicaSet

Ouh, so it looks like it is using the selector only as an alternative to the first method.

Let's keep reading. Here is a quote from Pod Selector section:

Pod Selector

The .spec.selector field is a label selector. As discussed earlier these are the labels used to identify potential Pods to acquire

It looks like these labels are not used as a primary method to keep track of pod owned by the ReplicaSet, they are use to "identify potential Pods to acquire". But what does it mean?

Why would ReplicaSet acquire pods it does not own? There is a section in documentation that tries to answer this very question: Non-Template Pod acquisition

Non-Template Pod acquisitions

While you can create bare Pods with no problems, it is strongly recommended to make sure that the bare Pods do not have labels which match the selector of one of your ReplicaSets. The reason for this is because a ReplicaSet is not limited to owning Pods specified by its template-- it can acquire other Pods in the manner specified in the previous sections.

[...]

As those Pods do not have a Controller (or any object) as their owner reference and match the selector of the [...] ReplicaSet, they will immediately be acquired by it.

Great, but this still does not answer the question: Why do I need to provide the selector? Couldn't it just use that hash?

Back in the past when there was a bug in k8s: https://github.com/kubernetes/kubernetes/issues/23170 so someone suggested the validation is needed: https://github.com/kubernetes/kubernetes/issues/23218 And so validation appeared: https://github.com/kubernetes/kubernetes/pull/23530

And it stayed with us to this day, even if today we probably could live without it.

Although I think its better that it's there because it minimizes the chances of overlaping labels in case of pod-template-hash collision for different RSs.

like image 174
Matt Avatar answered Oct 18 '22 01:10

Matt


one use case why we use pod-label "AND" pod-template-hash as Selector may be to handle the replicasets during updates/roll-back etc..

eg:-

In your scenario, the replicaset currently uses Selector app=nginx-app,pod-template-hash=b8b875889. consider the deployment is being updated to a later version of nginx image, as part of the upgrade it creates a new replicaset in the background which uses same selector but with new pod-template-hash, meaning the selector for the new replicaset will be "app=nginx-app,pod-template-hash=XXXXXXXX". As part of the upgrade the pods from old replicaset will be terminated and new pods will be created in the new replicaset. As the pod label (app=nginx-app) is common for both these replicasets, to manage them effectively and independently we need to use another selector which is unique for these replicasets. This is achieved by using pod-template-hash along with pod-label as selector.

like image 21
SAJEESH KUMAR Avatar answered Oct 18 '22 01:10

SAJEESH KUMAR