Background:
While performance testing an application, I was getting inconsistent results when scaling the replicas for my php-fpm
containers where I realized that 3/4 pods were scheduled on the same node.
I then configured anti affinity rules to not schedule pods on the same node. I quickly realized that using requiredDuringSchedulingIgnoredDuringExecution
was not an option because I could not have # of replicas > # of nodes so I configured preferredDuringSchedulingIgnoredDuringExecution
.
For the most part, it looks like my pods are scheduled evenly across all my nodes however sometimes (seen through a rolling upgrade), I see pods on the same node. I feel like the weight
value which is currently set to 100 is playing a factor.
Here is the yaml I am using (helm):
{{- if .Values.podAntiAffinity }}
{{- if .Values.podAntiAffinity.enabled }}
affinity:
podAntiAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 100
podAffinityTerm:
labelSelector:
matchLabels:
app: "{{ .Values.deploymentName }}"
topologyKey: "kubernetes.io/hostname"
{{- end }}
{{- end }}
Questions:
The way I read the documentation, the weight
number will be added to a calculated score for the node based on how busy it is (simplified) however what I don't understand is how a weight of 1 vs 100 would be any different?
Why are pods sometimes scheduled on the same node with this rule? Is it because the total score for the node that the pod wasn't scheduled on is too low (as it is too busy)?
Is there a way to see a log/event of how the pod was scheduled on a particular node? I'd expect kubectl describe pod
to have those details but seemingly it does not (except in an error scenario).
podAntiAffinity: preferredDuringSchedulingIgnoredDuringExecution: - weight: 100 podAffinityTerm: labelSelector: matchExpressions: - key: security operator: In values: - S2 topologyKey: kubernetes.io/hostname.
The affinity rule says that the scheduler can only schedule a Pod onto a node if the node is in the same zone as one or more existing Pods with the label security=S1 .
topologyKey is the key of node labels. If two Nodes are labelled with this key and have identical values for that label, the scheduler treats both Nodes as being in the same topology. The scheduler tries to place a balanced number of Pods into each topology domain.
Affinities are used to express Pod scheduling constraints that can match characteristics of candidate Nodes and the Pods that are already running on those Nodes. A Pod that has an “affinity” to a given Node is more likely to be scheduled to it; conversely, an “anti-affinity” makes it less probable it'll be scheduled.
preferredDuringSchedulingIgnoredDuringExecution is not guaranteed.
two types of node affinity, called requiredDuringSchedulingIgnoredDuringExecution and preferredDuringSchedulingIgnoredDuringExecution. You can think of them as “hard” and “soft” respectively, in the sense that the former specifies rules that must be met for a pod to be scheduled onto a node (just like nodeSelector but using a more expressive syntax), while the latter specifies preferences that the scheduler will try to enforce but will not guarantee.
The weight you set is giving an edge but there are other parameters (set by user and kubernetes) with their own weights. Below example should give a better picture where weight that you set matters
affinity:
nodeAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- preference:
matchExpressions:
- key: example.com/myLabel
operator: In
values:
- a
weight: 40
- preference:
matchExpressions:
- key: example.com/myLabel
operator: In
values:
- b
weight: 35
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With