How does weight affect pod scheduling when affinity rules are set?

Tags:

kubernetes

Background:

While performance testing an application, I was getting inconsistent results when scaling the replicas for my php-fpm containers where I realized that 3/4 pods were scheduled on the same node.

I then configured anti affinity rules to not schedule pods on the same node. I quickly realized that using requiredDuringSchedulingIgnoredDuringExecution was not an option because I could not have # of replicas > # of nodes so I configured preferredDuringSchedulingIgnoredDuringExecution.

For the most part, it looks like my pods are scheduled evenly across all my nodes however sometimes (seen through a rolling upgrade), I see pods on the same node. I feel like the weight value which is currently set to 100 is playing a factor.

Here is the yaml I am using (helm):

      {{- if .Values.podAntiAffinity }}
      {{- if .Values.podAntiAffinity.enabled }}
      affinity:
        podAntiAffinity:
          preferredDuringSchedulingIgnoredDuringExecution:
          - weight: 100
            podAffinityTerm:
              labelSelector:
                matchLabels:
                  app: "{{ .Values.deploymentName }}"
              topologyKey: "kubernetes.io/hostname"
      {{- end }}
      {{- end }}

Questions:

The way I read the documentation, the weight number will be added to a calculated score for the node based on how busy it is (simplified) however what I don't understand is how a weight of 1 vs 100 would be any different?

Why are pods sometimes scheduled on the same node with this rule? Is it because the total score for the node that the pod wasn't scheduled on is too low (as it is too busy)?

Is there a way to see a log/event of how the pod was scheduled on a particular node? I'd expect kubectl describe pod to have those details but seemingly it does not (except in an error scenario).

529

asked Jan 15 '20 21:01

leeman24

Video Answer

1 Answers

preferredDuringSchedulingIgnoredDuringExecution is not guaranteed.

two types of node affinity, called requiredDuringSchedulingIgnoredDuringExecution and preferredDuringSchedulingIgnoredDuringExecution. You can think of them as “hard” and “soft” respectively, in the sense that the former specifies rules that must be met for a pod to be scheduled onto a node (just like nodeSelector but using a more expressive syntax), while the latter specifies preferences that the scheduler will try to enforce but will not guarantee.

The weight you set is giving an edge but there are other parameters (set by user and kubernetes) with their own weights. Below example should give a better picture where weight that you set matters

 affinity:
   nodeAffinity:
     preferredDuringSchedulingIgnoredDuringExecution:
     - preference:
         matchExpressions:
         - key: example.com/myLabel
           operator: In
           values:
           - a
       weight: 40
     - preference:
         matchExpressions:
         - key: example.com/myLabel
           operator: In
           values:
           - b
       weight: 35

178

answered Oct 22 '22 15:10

ffran09

Related questions
                            
                                How to get browsable url from Docker-for-mac or Docker-for-Windows?
                            
                                How to deploy Spark application jar file to Kubernetes cluster?
                            
                                Using Lists or triple dashes to put multiple Kubernetes objects in one YAML file: purely a stylistic choice?
                            
                                Ignite not discoverable in kubernetes cluster with TcpDiscoveryKubernetesIpFinder
                            
                                How to interpret this kernel message: cgroup out of memory: Kill process 1234 .... score 1974 or sacrifice child?
                            
                                Can I use an HTTP POST in a preStop lifecycle hook in a Kubernetes job?
                            
                                NGINX Ingress Controller hide Nginx version
                            
                                how to delete/remove calico cni from my kubernetes cluster
                            
                                How to completely purge minikube config or reset IP back to 192.168.99.100
                            
                                Kubernetes client-go creating services and enpdoints
                            
                                Letting only one elasticsearch pod come up on a node in Kubernetes
                            
                                How to specify values for parent Helm chart
                            
                                Kubernetes ingress-nginx gives 502 error (Bad Gateway)
                            
                                How to detect GKE autoupgrading a node in Stackdriver logs
                            
                                Minikube mounted host folders are not working
                            
                                How can I configure an AWS EKS autoscaler with Terraform?
                            
                                kubectl rollout status for ALL deployments in a namespace
                            
                                Use template to define sub-chart values with Helm
                            
                                Scheduling Spark Jobs Running on Kubernetes via Airflow
                            
                                Kubernetes Ingress Whitelist IP for path

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With