Is there any way to make kubernetes distribute pods as much as possible? I have "Requests" on all deployments and global Requests as well as HPA. all nodes are the same.
Just had a situation where my ASG scaled down a node and one service became completely unavailable as all 4 pods were on the same node that was scaled down.
I would like to maintain a situation where each deployment must spread its containers on at least 2 nodes.
In order to distribute pods evenly across all cluster worker nodes in an absolute even manner, we can use the well-known node label called kubernetes.io/hostname as a topology domain, which ensures each worker node is in its own topology domain.
First 3 pods are deployed on 3 different nodes just like in the previous case. And the rest 3 (6 pods minus 3 nodes) are deployed on various nodes according to kubernetes internal considerations.
The key thing about pods is that when a pod does contain multiple containers, all of them are always run on a single worker node—it never spans multiple worker nodes, as shown in figure 3.1.
Pods on a node can communicate with all pods on all nodes without NAT. Agents on a node (system daemons, kubelet) can communicate with all the pods on that specific node.
Here I leverage Anirudh's answer adding example code.
My initial kubernetes yaml looked like this:
apiVersion: extensions/v1beta1 kind: Deployment metadata: name: say-deployment spec: replicas: 6 template: metadata: labels: app: say spec: containers: - name: say image: gcr.io/hazel-champion-200108/say ports: - containerPort: 8080 --- kind: Service apiVersion: v1 metadata: name: say-service spec: selector: app: say ports: - protocol: TCP port: 8080 type: LoadBalancer externalIPs: - 192.168.0.112
At this point, kubernetes scheduler somehow decides that all the 6 replicas should be deployed on the same node.
Then I added requiredDuringSchedulingIgnoredDuringExecution
to force the pods beeing deployed on different nodes:
apiVersion: extensions/v1beta1 kind: Deployment metadata: name: say-deployment spec: replicas: 3 template: metadata: labels: app: say spec: containers: - name: say image: gcr.io/hazel-champion-200108/say ports: - containerPort: 8080 affinity: podAntiAffinity: requiredDuringSchedulingIgnoredDuringExecution: - labelSelector: matchExpressions: - key: "app" operator: In values: - say topologyKey: "kubernetes.io/hostname" --- kind: Service apiVersion: v1 metadata: name: say-service spec: selector: app: say ports: - protocol: TCP port: 8080 type: LoadBalancer externalIPs: - 192.168.0.112
Now all the pods are run on different nodes. And since I have 3 nodes and 6 pods, other 3 pods (6 minus 3) can't be running (pending). This is because I required it: requiredDuringSchedulingIgnoredDuringExecution
.
kubectl get pods -o wide NAME READY STATUS RESTARTS AGE IP NODE say-deployment-8b46845d8-4zdw2 1/1 Running 0 24s 10.244.2.80 night say-deployment-8b46845d8-699wg 0/1 Pending 0 24s <none> <none> say-deployment-8b46845d8-7nvqp 1/1 Running 0 24s 10.244.1.72 gray say-deployment-8b46845d8-bzw48 1/1 Running 0 24s 10.244.0.25 np3 say-deployment-8b46845d8-vwn8g 0/1 Pending 0 24s <none> <none> say-deployment-8b46845d8-ws8lr 0/1 Pending 0 24s <none> <none>
Now if I loosen this requirement with preferredDuringSchedulingIgnoredDuringExecution
:
apiVersion: extensions/v1beta1 kind: Deployment metadata: name: say-deployment spec: replicas: 6 template: metadata: labels: app: say spec: containers: - name: say image: gcr.io/hazel-champion-200108/say ports: - containerPort: 8080 affinity: podAntiAffinity: preferredDuringSchedulingIgnoredDuringExecution: - weight: 100 podAffinityTerm: labelSelector: matchExpressions: - key: "app" operator: In values: - say topologyKey: "kubernetes.io/hostname" --- kind: Service apiVersion: v1 metadata: name: say-service spec: selector: app: say ports: - protocol: TCP port: 8080 type: LoadBalancer externalIPs: - 192.168.0.112
First 3 pods are deployed on 3 different nodes just like in the previous case. And the rest 3 (6 pods minus 3 nodes) are deployed on various nodes according to kubernetes internal considerations.
NAME READY STATUS RESTARTS AGE IP NODE say-deployment-57cf5fb49b-26nvl 1/1 Running 0 59s 10.244.2.81 night say-deployment-57cf5fb49b-2wnsc 1/1 Running 0 59s 10.244.0.27 np3 say-deployment-57cf5fb49b-6v24l 1/1 Running 0 59s 10.244.1.73 gray say-deployment-57cf5fb49b-cxkbz 1/1 Running 0 59s 10.244.0.26 np3 say-deployment-57cf5fb49b-dxpcf 1/1 Running 0 59s 10.244.1.75 gray say-deployment-57cf5fb49b-vv98p 1/1 Running 0 59s 10.244.1.74 gray
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With