Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Scheduling kube-dns on dedicated node pool

Tags:

I have a cluster running on GCP that currently consists entirely of preemtible nodes. We're experiencing issues where kube-dns becomes unavailable (presumably because a node has been preempted). We'd like to improve the resilience of DNS by moving kube-dns pods to more stable nodes.

Is it possible to schedule system cluster critical pods like kube-dns (or all pods in the kube-system namespace) on a node pool of only non-preemptible nodes? I'm wary of using affinity or anti-affinity or taints, since these pods are auto-created at cluster bootstrapping and any changes made could be clobbered by a Kubernetes version upgrade. Is there a way do do this that will persist across upgrades?

like image 857
Faun Avatar asked Jun 19 '18 18:06

Faun


People also ask

How do you schedule pods on different nodes?

You can add the nodeSelector field to your Pod specification and specify the node labels you want the target node to have. Kubernetes only schedules the Pod onto nodes that have each of the labels you specify.

What are the differences between CoreDNS and kube-DNS?

CoreDNS is a single container per instance, vs kube-dns which uses three. Kube-dns uses dnsmasq for caching, which is single threaded C. CoreDNS is multi-threaded Go. CoreDNS enables negative caching in the default deployment.

Where does kube-DNS run?

Kubernetes expects that a service is running within the pod network mesh that performs name resolution and acts as the primary name server within the cluster. , which runs on the master nodes.


1 Answers

The solution was to use taints and tolerations in conjunction with node affinity. We created a second node pool, and added a taint to the preemptible pool.

Terraform config:

resource "google_container_node_pool" "preemptible_worker_pool" {
  node_config {
    ...
    preemptible     = true

    labels {
      preemptible = "true"
      dedicated   = "preemptible-worker-pool"
    }

    taint {
      key    = "dedicated"
      value  = "preemptible-worker-pool"
      effect = "NO_SCHEDULE"
    }
  }
}

We then used a toleration and nodeAffinity to allow our existing workloads to run on the tainted node pool, effectively forcing the cluster-critical pods to run on the untainted (non-preemtible) node pool.

Kubernetes config:

spec:
  template:
    spec:
      # The affinity + tolerations sections together allow and enforce that the workers are
      # run on dedicated nodes tainted with "dedicated=preemptible-worker-pool:NoSchedule".
      affinity:
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
            - matchExpressions:
              - key: dedicated
                operator: In
                values:
                - preemptible-worker-pool
      tolerations:
      - key: dedicated
        operator: "Equal"
        value: preemptible-worker-pool
        effect: "NoSchedule"
like image 124
Faun Avatar answered Sep 28 '22 19:09

Faun