Imagine in a Master-Node-Node setup where you deploy a service with pod anti-affinity on the Nodes: An update of the Deployment will cause another pod being created but the scheduler not being able to schedule, because both Nodes have the anti-affinity.
Q: How could one more flexibly set the anti-affinity to allow the update?
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: app
operator: In
values:
- api
topologyKey: kubernetes.io/hostname
With an error
No nodes are available that match all of the following predicates:: MatchInterPodAffinity (2), PodToleratesNodeTaints (1).
Look at Max Surge
If you set Max Surge = 0, you are telling Kubernetes that you won't allow it to create more pods than the number of replicas you have setup for the deployment. This basically forces Kubernetes to remove a pod before starting a new one, and thereby making room for the new pod first, getting you around the podAntiAffinity issue. I've utilized this mechanism myself, with great success.
Config example
apiVersion: extensions/v1beta1
kind: Deployment
...
spec:
replicas: <any number larger than 1>
...
strategy:
rollingUpdate:
maxSurge: 0
maxUnavailable: 1
type: RollingUpdate
...
template:
...
spec:
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: app
operator: In
values:
- api
topologyKey: kubernetes.io/hostname
Warning: Don't do this if you only have one replica, as it will cause downtime because the only pod will be removed before a new one is added. If you have a huge number of replicas, which will make deployments slow because Kubernetes can only upgrade 1 pod at at a time, you can crank up maxUnavailable to enable Kubernetes to remove a higher number of pods at a time.
Here are a few methods which may work:
Max Unavailable
.spec.strategy.rollingUpdate.maxUnavailable is an optional field that specifies the maximum number of Pods that can be unavailable during the update process. The value can be an absolute number (for example, 5) or a percentage of desired Pods (for example, 10%). The absolute number is calculated from percentage by rounding down. The value cannot be 0 if .spec.strategy.rollingUpdate.maxSurge is 0. The default value is 25%.
For example, when this value is set to 30%, the old ReplicaSet can be scaled down to 70% of desired Pods immediately when the rolling update starts. Once new Pods are ready, old ReplicaSet can be scaled down further, followed by scaling up the new ReplicaSet, ensuring that the total number of Pods available at all times during the update is at least 70% of the desired Pods. Max Surge
Pod Anti-Affinity
There are currently two types of node affinity, called requiredDuringSchedulingIgnoredDuringExecution and preferredDuringSchedulingIgnoredDuringExecution. You can think of them as “hard” and “soft” respectively, in the sense that the former specifies rules that must be met for a pod to be scheduled onto a node (just like nodeSelector but using a more expressive syntax), while the latter specifies preferences that the scheduler will try to enforce but will not guarantee.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With