GKE kubernetes kube-system resources nodeAffinity

I have a multi-regional testing setup on GKE k8s 1.9.4. Every cluster has:

  • an ingress, configured with kubemci
  • 3 node pools with different node labels:
    • default-pool system (1vCPU / 2GB RAM)
    • frontend-pool frontend (2vCPU / 2GB RAM)
    • backend-pool backend (1vCPU / 600Mb RAM)
  • HPA with scaling by the custom metric

So stuff like prometheus-operator, prometheus-server, custom-metrics-api-server and kube-state-metrics attached to a node with system label.

Frontend and backend pod attached to nodes with frontend and backend labels respectively (single pod to a single node), see podantiaffinity.

After autoscaling scales backend or frontend pods down, them nodes remains to stay, as there appear to be pods from kube-system namespace, i.e heapster. This leads to a situation when node with frontend / backend label stays alive after downscaling even there's no backend or frontend pod left on it.

The question is: how can I avoid creating kube-system pods on the nodes, that serving my application (if this is really sane and possible)?

Guess, I should use taints and tolerations for backend and frontend nodes, but how it can be combined with HPA and in-cluster node autoscaler?

1 Answers

Seems like taints and tolerations did the trick.

Create a cluster with a default node pool (for monitoring and kube-system):

gcloud container --project "my-project-id" clusters create "app-europe" \
  --zone "europe-west1-b" --username="admin" --cluster-version "1.9.4-gke.1" --machine-type "custom-2-4096" \
  --image-type "COS" --disk-size "10" --num-nodes "1" --network "default" --enable-cloud-logging --enable-cloud-monitoring \
  --maintenance-window "01:00" --node-labels=region=europe-west1,role=system

Create node pool for your application:

gcloud container --project "my-project-id" node-pools create "frontend" \
      --cluster "app-europe" --zone "europe-west1-b" --machine-type "custom-2-2048" --image-type "COS" \
      --disk-size "10" --node-labels=region=europe-west1,role=frontend \
      --node-taints app=frontend:NoSchedule \
      --enable-autoscaling --num-nodes "1" --min-nodes="1" --max-nodes="3"

then add nodeAffinity and tolerations sections to a pods template spec in your deployment manifest:

  - key: "app"
    operator: "Equal"
    value: "frontend"
    effect: "NoSchedule"
        - matchExpressions:
          - key: beta.kubernetes.io/instance-type
            operator: In
            - custom-2-2048
        - matchExpressions:
          - key: role
            operator: In
            - frontend
