Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Kube flannel in CrashLoopBackOff status

We just start to create our cluster on kubernetes.

Now we try to deploy tiller but we have en error:

NetworkPlugin cni failed to set up pod "tiller-deploy-64c9d747bd-br9j7_kube-system" network: open /run/flannel/subnet.env: no such file or directory

After that I call:

kubectl get pods --all-namespaces -o wide

And got response:

NAMESPACE     NAME                                   READY     STATUS              RESTARTS   AGE       IP              NODE          NOMINATED NODE
kube-system   coredns-78fcdf6894-ksdvt               1/1       Running             2          7d        192.168.0.4     kube-master   <none>
kube-system   coredns-78fcdf6894-p4l9q               1/1       Running             2          7d        192.168.0.5     kube-master   <none>
kube-system   etcd-kube-master                       1/1       Running             2          7d        10.168.209.20   kube-master   <none>
kube-system   kube-apiserver-kube-master             1/1       Running             2          7d        10.168.209.20   kube-master   <none>
kube-system   kube-controller-manager-kube-master    1/1       Running             2          7d        10.168.209.20   kube-master   <none>
kube-system   kube-flannel-ds-amd64-42rl7            0/1       CrashLoopBackOff    2135       7d        10.168.209.17   node5         <none>
kube-system   kube-flannel-ds-amd64-5fx2p            0/1       CrashLoopBackOff    2164       7d        10.168.209.14   node2         <none>
kube-system   kube-flannel-ds-amd64-6bw5g            0/1       CrashLoopBackOff    2166       7d        10.168.209.15   node3         <none>
kube-system   kube-flannel-ds-amd64-hm826            1/1       Running             1          7d        10.168.209.20   kube-master   <none>
kube-system   kube-flannel-ds-amd64-thjps            0/1       CrashLoopBackOff    2160       7d        10.168.209.16   node4         <none>
kube-system   kube-flannel-ds-amd64-w99ch            0/1       CrashLoopBackOff    2166       7d        10.168.209.13   node1         <none>
kube-system   kube-proxy-d6v2n                       1/1       Running             0          7d        10.168.209.13   node1         <none>
kube-system   kube-proxy-lcckg                       1/1       Running             0          7d        10.168.209.16   node4         <none>
kube-system   kube-proxy-pgblx                       1/1       Running             1          7d        10.168.209.20   kube-master   <none>
kube-system   kube-proxy-rnqq5                       1/1       Running             0          7d        10.168.209.14   node2         <none>
kube-system   kube-proxy-wc959                       1/1       Running             0          7d        10.168.209.15   node3         <none>
kube-system   kube-proxy-wfqqs                       1/1       Running             0          7d        10.168.209.17   node5         <none>
kube-system   kube-scheduler-kube-master             1/1       Running             2          7d        10.168.209.20   kube-master   <none>
kube-system   kubernetes-dashboard-6948bdb78-97qcq   0/1       ContainerCreating   0          7d        <none>          node5         <none>
kube-system   tiller-deploy-64c9d747bd-br9j7         0/1       ContainerCreating   0          45m       <none>          node4         <none>

We have some flannel pods in CrashLoopBackOff status. For example kube-flannel-ds-amd64-42rl7.

When I call:

kubectl describe pod -n kube-system kube-flannel-ds-amd64-42rl7

I've got status Running:

Name:               kube-flannel-ds-amd64-42rl7
Namespace:          kube-system
Priority:           0
PriorityClassName:  <none>
Node:               node5/10.168.209.17
Start Time:         Wed, 22 Aug 2018 16:47:10 +0300
Labels:             app=flannel
                    controller-revision-hash=911701653
                    pod-template-generation=1
                    tier=node
Annotations:        <none>
Status:             Running
IP:                 10.168.209.17
Controlled By:      DaemonSet/kube-flannel-ds-amd64
Init Containers:
  install-cni:
    Container ID:  docker://eb7ee47459a54d401969b1770ff45b39dc5768b0627eec79e189249790270169
    Image:         quay.io/coreos/flannel:v0.10.0-amd64
    Image ID:      docker-pullable://quay.io/coreos/flannel@sha256:88f2b4d96fae34bfff3d46293f7f18d1f9f3ca026b4a4d288f28347fcb6580ac
    Port:          <none>
    Host Port:     <none>
    Command:
      cp
    Args:
      -f
      /etc/kube-flannel/cni-conf.json
      /etc/cni/net.d/10-flannel.conflist
    State:          Terminated
      Reason:       Completed
      Exit Code:    0
      Started:      Wed, 22 Aug 2018 16:47:24 +0300
      Finished:     Wed, 22 Aug 2018 16:47:24 +0300
    Ready:          True
    Restart Count:  0
    Environment:    <none>
    Mounts:
      /etc/cni/net.d from cni (rw)
      /etc/kube-flannel/ from flannel-cfg (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from flannel-token-9wmch (ro)
Containers:
  kube-flannel:
    Container ID:  docker://521b457c648baf10f01e26dd867b8628c0f0a0cc0ea416731de658e67628d54e
    Image:         quay.io/coreos/flannel:v0.10.0-amd64
    Image ID:      docker-pullable://quay.io/coreos/flannel@sha256:88f2b4d96fae34bfff3d46293f7f18d1f9f3ca026b4a4d288f28347fcb6580ac
    Port:          <none>
    Host Port:     <none>
    Command:
      /opt/bin/flanneld
    Args:
      --ip-masq
      --kube-subnet-mgr
    State:          Waiting
      Reason:       CrashLoopBackOff
    Last State:     Terminated
      Reason:       Error
      Exit Code:    1
      Started:      Thu, 30 Aug 2018 10:15:04 +0300
      Finished:     Thu, 30 Aug 2018 10:15:08 +0300
    Ready:          False
    Restart Count:  2136
    Limits:
      cpu:     100m
      memory:  50Mi
    Requests:
      cpu:     100m
      memory:  50Mi
    Environment:
      POD_NAME:       kube-flannel-ds-amd64-42rl7 (v1:metadata.name)
      POD_NAMESPACE:  kube-system (v1:metadata.namespace)
    Mounts:
      /etc/kube-flannel/ from flannel-cfg (rw)
      /run from run (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from flannel-token-9wmch (ro)
Conditions:
  Type              Status
  Initialized       True
  Ready             False
  ContainersReady   False
  PodScheduled      True
Volumes:
  run:
    Type:          HostPath (bare host directory volume)
    Path:          /run
    HostPathType:
  cni:
    Type:          HostPath (bare host directory volume)
    Path:          /etc/cni/net.d
    HostPathType:
  flannel-cfg:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      kube-flannel-cfg
    Optional:  false
  flannel-token-9wmch:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  flannel-token-9wmch
    Optional:    false
QoS Class:       Guaranteed
Node-Selectors:  beta.kubernetes.io/arch=amd64
Tolerations:     node-role.kubernetes.io/master:NoSchedule
                 node.kubernetes.io/disk-pressure:NoSchedule
                 node.kubernetes.io/memory-pressure:NoSchedule
                 node.kubernetes.io/not-ready:NoExecute
                 node.kubernetes.io/unreachable:NoExecute
Events:
  Type     Reason   Age                  From            Message
  ----     ------   ----                 ----            -------
  Normal   Pulled   51m (x2128 over 7d)  kubelet, node5  Container image "quay.io/coreos/flannel:v0.10.0-amd64" already present on machine
  Warning  BackOff  1m (x48936 over 7d)  kubelet, node5  Back-off restarting failed container

here kube-controller-manager.yaml:

apiVersion: v1
kind: Pod
metadata:
  annotations:
    scheduler.alpha.kubernetes.io/critical-pod: ""
  creationTimestamp: null
  labels:
    component: kube-controller-manager
    tier: control-plane
  name: kube-controller-manager
  namespace: kube-system
spec:
  containers:
  - command:
    - kube-controller-manager
    - --address=127.0.0.1
    - --allocate-node-cidrs=true
    - --cluster-cidr=192.168.0.0/24
    - --cluster-signing-cert-file=/etc/kubernetes/pki/ca.crt
    - --cluster-signing-key-file=/etc/kubernetes/pki/ca.key
    - --controllers=*,bootstrapsigner,tokencleaner
    - --kubeconfig=/etc/kubernetes/controller-manager.conf
    - --leader-elect=true
    - --node-cidr-mask-size=24
    - --root-ca-file=/etc/kubernetes/pki/ca.crt
    - --service-account-private-key-file=/etc/kubernetes/pki/sa.key
    - --use-service-account-credentials=true
    image: k8s.gcr.io/kube-controller-manager-amd64:v1.11.2
    imagePullPolicy: IfNotPresent
    livenessProbe:
      failureThreshold: 8
      httpGet:
        host: 127.0.0.1
        path: /healthz
        port: 10252
        scheme: HTTP
      initialDelaySeconds: 15
      timeoutSeconds: 15
    name: kube-controller-manager
    resources:
      requests:
        cpu: 200m
    volumeMounts:
    - mountPath: /etc/ssl/certs
      name: ca-certs
      readOnly: true
    - mountPath: /etc/kubernetes/controller-manager.conf
      name: kubeconfig
      readOnly: true
    - mountPath: /usr/libexec/kubernetes/kubelet-plugins/volume/exec
      name: flexvolume-dir
    - mountPath: /etc/pki
      name: etc-pki
      readOnly: true
    - mountPath: /etc/kubernetes/pki
      name: k8s-certs
      readOnly: true
  hostNetwork: true
  priorityClassName: system-cluster-critical
  volumes:
  - hostPath:
      path: /etc/ssl/certs
      type: DirectoryOrCreate
    name: ca-certs
  - hostPath:
      path: /etc/kubernetes/controller-manager.conf
      type: FileOrCreate
    name: kubeconfig
  - hostPath:
      path: /usr/libexec/kubernetes/kubelet-plugins/volume/exec
      type: DirectoryOrCreate
    name: flexvolume-dir
  - hostPath:
      path: /etc/pki
      type: DirectoryOrCreate
    name: etc-pki
  - hostPath:
      path: /etc/kubernetes/pki
      type: DirectoryOrCreate
    name: k8s-certs
status: {}

OS is CentOS Linux release 7.5.1804

logs from one of pods:

# kubectl logs --namespace kube-system kube-flannel-ds-amd64-5fx2p

main.go:475] Determining IP address of default interface
main.go:488] Using interface with name eth0 and address 10.168.209.14
main.go:505] Defaulting external address to interface address (10.168.209.14)
kube.go:131] Waiting 10m0s for node controller to sync
kube.go:294] Starting kube subnet manager
kube.go:138] Node controller sync successful
main.go:235] Created subnet manager: Kubernetes Subnet Manager - node2
main.go:238] Installing signal handlers
main.go:353] Found network config - Backend type: vxlan
vxlan.go:120] VXLAN config: VNI=1 Port=0 GBP=false DirectRouting=false
main.go:280] Error registering network: failed to acquire lease: node "node2" pod cidr not assigned
main.go:333] Stopping shutdownHandler...

Where error is?

like image 203
Alexey Vashchenkov Avatar asked Aug 30 '18 13:08

Alexey Vashchenkov


2 Answers

Try this:

Failed to acquire lease simply means, the pod didn't get the podCIDR. Happened with me as well although the manifest on master-node says podCIDR true but still it wasn't working and funnel going in crashbackloop. This is what i did to fix it.

From the master-node, first find out your funnel CIDR

sudo cat /etc/kubernetes/manifests/kube-controller-manager.yaml | grep -i cluster-cidr

Output:

- --cluster-cidr=172.168.10.0/24

Then run the following from the master node:

kubectl patch node slave-node-1 -p '{"spec":{"podCIDR":"172.168.10.0/24"}}'

where, slave-node-1 is your node where acquire lease is failing podCIDR is the cidr that you found in previous command

Hope this helps.

like image 129
PanDe Avatar answered Oct 28 '22 13:10

PanDe


For flannel to work correctly, you must pass --pod-network-cidr=10.244.0.0/16 to kubeadm init.

like image 17
abdelkhaliq bouharaoua Avatar answered Oct 28 '22 12:10

abdelkhaliq bouharaoua