Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

CrashLoopBackOff on kubernetes-dashboard

I'm a noob with Kubernetes. I'm trying to follow some recipes to get a small cluster up and running, but I'm having troubles ...

I have a master and (4) nodes, all running Ubuntu 16.04

installed docker on all nodes:

$ sudo apt-get update
$ sudo apt-get install -y \
    apt-transport-https \
    ca-certificates \
    curl \
    software-properties-common
$ sudo curl -fsSL https://download.docker.com/linux/ubuntu/gpg | apt-key add -
$ sudo add-apt-repository \
   "deb https://download.docker.com/linux/$(. /etc/os-release; echo "$ID") \
   $(lsb_release -cs) \
   stable"
$ sudo apt-get update && apt-get install -y docker-ce=$(apt-cache madison docker-ce | grep 17.03 | head -1 | awk '{print $3}')

$ sudo docker version
Client:
 Version:       17.12.1-ce
 API version:   1.35
 Go version:    go1.9.4
 Git commit:    7390fc6
 Built: Tue Feb 27 22:17:40 2018
 OS/Arch:       linux/amd64

Server:
 Engine:
  Version:      17.12.1-ce
  API version:  1.35 (minimum version 1.12)
  Go version:   go1.9.4
  Git commit:   7390fc6
  Built:        Tue Feb 27 22:16:13 2018
  OS/Arch:      linux/amd64
  Experimental: false

turned off swap on all nodes

$ sudo swapoff -a

commented out the swap mounts in /etc/fstab

$ sudo vi /etc/fstab
$ mount -a

installed kubeadm & kubectl on all nodes:

$ sudo curl -s https://packages.cloud.google.com/apt/doc/apt-key.gpg | apt-key add -
$ sudo cat <<EOF >/etc/apt/sources.list.d/kubernetes.list
deb http://apt.kubernetes.io/ kubernetes-xenial main
EOF
$ sudo apt-get update
$ sudo apt-get install -y kubeadm kubectl
$ kubeadm version
kubeadm version: &version.Info{Major:"1", Minor:"9", GitVersion:"v1.9.4", 
GitCommit:"bee2d1505c4fe820744d26d41ecd3fdd4a3d6546", GitTreeState:"clean", 
BuildDate:"2018-03-12T16:21:35Z", GoVersion:"go1.9.3", Compiler:"gc", 
Platform:"linux/amd64"}

downloaded and unpacked this into /usr/local/bin on master and all nodes: https://github.com/kubernetes-incubator/cri-tools/releases

installed etcd 3.3.0 on all nodes:

$ sudo groupadd --system etcd
$ sudo useradd --home-dir "/var/lib/etcd" \
      --system \
      --shell /bin/false \
      -g etcd \
      etcd

$ sudo mkdir -p /etc/etcd
$ sudo chown etcd:etcd /etc/etcd
$ sudo mkdir -p /var/lib/etcd
$ sudo chown etcd:etcd /var/lib/etcd

$ sudo rm -rf /tmp/etcd && mkdir -p /tmp/etcd
$ sudo curl -L https://github.com/coreos/etcd/releases/download/v3.3.0/etcd-    v3.3.0-linux-amd64.tar.gz -o /tmp/etcd-3.3.0-linux-amd64.tar.gz
$ sudo tar xzvf /tmp/etcd-3.3.0-linux-amd64.tar.gz -C /tmp/etcd --strip-components=1
$ sudo cp /tmp/etcd/etcd /usr/bin/etcd
$ sudo cp /tmp/etcd/etcdctl /usr/bin/etcdctl

noted the IP of the master:

$ sudo ifconfig -a eth0
eth0      Link encap:Ethernet  HWaddr 1e:00:51:00:00:28
          inet addr:172.20.43.30  Bcast:172.20.43.255  Mask:255.255.254.0
          inet6 addr: fe80::27b5:3d06:94c9:9d0/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:3194023 errors:0 dropped:0 overruns:0 frame:0
          TX packets:3306456 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:338523846 (338.5 MB)  TX bytes:3682444019 (3.6 GB)

initialized kubernetes on the master:

$ sudo kubeadm init --pod-network-cidr=172.20.43.0/16 \
                    --apiserver-advertise-address=172.20.43.30 \
                    --ignore-preflight-errors=cri \
                    --kubernetes-version stable-1.9

[init] Using Kubernetes version: v1.9.4
[init] Using Authorization modes: [Node RBAC]
[preflight] Running pre-flight checks.
        [WARNING CRI]: unable to check if the container runtime at     "/var/run/dockershim.sock" is running: exit status 1
[certificates] Generated ca certificate and key.
[certificates] Generated apiserver certificate and key.
[certificates] apiserver serving cert is signed for DNS names [jenkins-kube-    master kubernetes kubernetes.default kubernetes.default.svc     kubernetes.default.svc.cluster.local] and IPs [10.96.0.1 172.20.43.30]
[certificates] Generated apiserver-kubelet-client certificate and key.
[certificates] Generated sa key and public key.
[certificates] Generated front-proxy-ca certificate and key.
[certificates] Generated front-proxy-client certificate and key.
[certificates] Valid certificates and keys now exist in "/etc/kubernetes/pki"
[kubeconfig] Wrote KubeConfig file to disk: "admin.conf"
[kubeconfig] Wrote KubeConfig file to disk: "kubelet.conf"
[kubeconfig] Wrote KubeConfig file to disk: "controller-manager.conf"
[kubeconfig] Wrote KubeConfig file to disk: "scheduler.conf"
[controlplane] Wrote Static Pod manifest for component kube-apiserver to     "/etc/kubernetes/manifests/kube-apiserver.yaml"
[controlplane] Wrote Static Pod manifest for component kube-controller-manager to "/etc/kubernetes/manifests/kube-controller-manager.yaml"
[controlplane] Wrote Static Pod manifest for component kube-scheduler to "/etc/kubernetes/manifests/kube-scheduler.yaml"
[etcd] Wrote Static Pod manifest for a local etcd instance to "/etc/kubernetes/manifests/etcd.yaml"
[init] Waiting for the kubelet to boot up the control plane as Static Pods from directory "/etc/kubernetes/manifests".
[init] This might take a minute or longer if the control plane images have to be pulled.
[apiclient] All control plane components are healthy after 37.502640 seconds
[uploadconfig] Storing the configuration used in ConfigMap "kubeadm-config" in the "kube-system" Namespace
[markmaster] Will mark node jenkins-kube-master as master by adding a label and a taint
[markmaster] Master jenkins-kube-master tainted and labelled with key/value: node-role.kubernetes.io/master=""
[bootstraptoken] Using token: 6be4b1.9a8dacf89f71e53c
[bootstraptoken] Configured RBAC rules to allow Node Bootstrap tokens to post CSRs in order for nodes to get long term certificate credentials
[bootstraptoken] Configured RBAC rules to allow the csrapprover controller automatically approve CSRs from a Node Bootstrap Token
[bootstraptoken] Configured RBAC rules to allow certificate rotation for all node client certificates in the cluster
[bootstraptoken] Creating the "cluster-info" ConfigMap in the "kube-public" namespace
[addons] Applied essential addon: kube-dns
[addons] Applied essential addon: kube-proxy

Your Kubernetes master has initialized successfully!

To start using your cluster, you need to run the following as a regular user:

  mkdir -p $HOME/.kube
  sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
  sudo chown $(id -u):$(id -g) $HOME/.kube/config

You should now deploy a pod network to the cluster.
Run "kubectl apply -f [podnetwork].yaml" with one of the options listed at:
  https://kubernetes.io/docs/concepts/cluster-administration/addons/

You can now join any number of machines by running the following on each node
as root:

  kubeadm join --token 6be4b1.9a8dacf89f71e53c 172.20.43.30:6443 --discovery-token-ca-cert-hash     sha256:524d29b032d7bfd319b147ab03a936bd429805258425bccca749de71bcb1efaf

on the master node:

$ sudo cp /etc/kubernetes/admin.conf $HOME/.kube/config
$ sudo chown $(id -u):$(id -g) $HOME/.kube/config
$ export KUBECONFIG=$HOME/.kube/config
$ echo "export KUBECONFIG=$HOME/.kube/config" | tee -a ~/.bashrc

setup flannel for networking on master:

$ sudo kubectl apply -f https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml
clusterrole "flannel" created
clusterrolebinding "flannel" created
serviceaccount "flannel" created
configmap "kube-flannel-cfg" created
daemonset "kube-flannel-ds" created

$ sudo kubectl apply -f https://raw.githubusercontent.com/coreos/flannel/master/Documentation/k8s-manifests/kube-flannel-rbac.yml
clusterrole "flannel" configured
clusterrolebinding "flannel" configured

join the nodes to the cluster running this on each:

$ sudo kubeadm join --token 6be4b1.9a8dacf89f71e53c 172.20.43.30:6443 \
--discovery-token-ca-cert-hash sha256:524d29b032d7bfd319b147ab03a936bd429805258425bccca749de71bcb1efaf \
--ignore-preflight-errors=cri

installed the dashboard on the master:

$ kubectl apply -f https://raw.githubusercontent.com/kubernetes/dashboard/master/src/deploy/recommended/kubernetes-dashboard.yaml
secret "kubernetes-dashboard-certs" created
serviceaccount "kubernetes-dashboard" created
role "kubernetes-dashboard-minimal" created
rolebinding "kubernetes-dashboard-minimal" created
deployment "kubernetes-dashboard" created
service "kubernetes-dashboard" created

started the proxy:

$ kubectl proxy
Starting to serve on 127.0.0.1:8001

opened another ssh to master with -L 8001:127.0.0.1:8001 and opened a local browser window for http://localhost:8001/ui

it redirects to http://localhost:8001/api/v1/namespaces/kube-system/services/https:kubernetes-dashboard:/proxy/ and says:

{
  "kind": "Status",
  "apiVersion": "v1",
  "metadata": {

  },
  "status": "Failure",
  "message": "no endpoints available for service \"https:kubernetes-    dashboard:\"",
  "reason": "ServiceUnavailable",
  "code": 503
}

checking the pods ...

$ sudo kubectl get pods --all-namespaces
NAMESPACE     NAME                                          READY     STATUS             RESTARTS   AGE
default       guids-74487d79cf-zsj8q                        1/1       Running            0          4h
kube-system   etcd-jenkins-kube-master                      1/1       Running            1          21h
kube-system   kube-apiserver-jenkins-kube-master            1/1       Running            1          21h
kube-system   kube-controller-manager-jenkins-kube-master   1/1       Running            2          21h
kube-system   kube-dns-6f4fd4bdf-7pr9q                      3/3       Running            0          1d
kube-system   kube-flannel-ds-pvk8m                         1/1       Running            0          4h
kube-system   kube-flannel-ds-q4fsl                         1/1       Running            0          4h
kube-system   kube-flannel-ds-qhxn6                         1/1       Running            0          21h
kube-system   kube-flannel-ds-tkspz                         1/1       Running            0          4h
kube-system   kube-flannel-ds-vgqsb                         1/1       Running            0          4h
kube-system   kube-proxy-7np9b                              1/1       Running            0          4h
kube-system   kube-proxy-9lx8h                              1/1       Running            1          1d
kube-system   kube-proxy-f46d8                              1/1       Running            0          4h
kube-system   kube-proxy-fdtx9                              1/1       Running            0          4h
kube-system   kube-proxy-kmnjf                              1/1       Running            0          4h
kube-system   kube-scheduler-jenkins-kube-master            1/1       Running            1          21h
kube-system   kubernetes-dashboard-5bd6f767c7-xf42n         0/1       CrashLoopBackOff   53         4h

checking the log ...

$ sudo kubectl logs kubernetes-dashboard-5bd6f767c7-xf42n --namespace=kube-system
2018/03/20 17:56:25 Starting overwatch
2018/03/20 17:56:25 Using in-cluster config to connect to apiserver
2018/03/20 17:56:25 Using service account token for csrf signing
2018/03/20 17:56:25 No request provided. Skipping authorization
2018/03/20 17:56:55 Error while initializing connection to Kubernetes apiserver. 
This most likely means that the cluster is misconfigured (e.g., it has invalid
 apiserver certificates or service accounts configuration) or the 
--apiserver-host param points to a server that does not exist. 
Reason: Get https://10.96.0.1:443/version: dial tcp 10.96.0.1:443: i/o timeout
Refer to our FAQ and wiki pages for more information: https://github.com/kubernetes/dashboard/wiki/FAQ

I find this reference to 10.96.0.1 rather odd. I don't have that on my network anywhere that I'm aware of.

I put the output of sudo kubectl describe pod --namespace=kube-system on pastebin: https://pastebin.com/cPppPkRw

Thanks in advance for any pointers.

-Steve Maring

Orlando, FL

like image 505
Steve Maring Avatar asked Mar 23 '18 19:03

Steve Maring


People also ask

What does CrashLoopBackOff mean in Kubernetes?

Conclusion. CrashLoopBackOff is a Kubernetes state representing a restart loop that is happening in a Pod: a container in the Pod is started, but crashes and is then restarted, over and over again. Kubernetes will wait an increasing back-off time between restarts to give you a chance to fix the error.

What is the most common reason for a pod to report CrashLoopBackOff as its state?

CrashLoopBackOff is a status message that indicates one of your pods is in a constant state of flux—one or more containers are failing and restarting repeatedly. This typically happens because each pod inherits a default restartPolicy of Always upon creation. Always-on implies each container that fails has to restart.


1 Answers

--service-cluster-ip-range=10.96.0.0/12

Line 76 of your pastebin shows the Service CIDR to be that, which squares with how kubernetes thinks of the world: .1 in the Service CIDR is always kubernetes (IIRC kube-dns gets a pretty low IP assignment, too, but I can't recall if it is always fixed like the kubernetes one is)

You'll want to either change both the Service and Pod CIDRs to fit within the 10.244.0.0/16 subnet that flannel created as a side-effect of deploying that yaml, or change its ConfigMap (err, at your peril now that the network has already been pushed into etcd) to align with the Service and Pod CIDR specified to your apiserver.

like image 161
mdaniel Avatar answered Sep 29 '22 01:09

mdaniel