Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Kubernetes: how to debug CrashLoopBackOff

I have the following setup:

A docker image omg/telperion on docker hub A kubernetes cluster (with 4 nodes, each with ~50GB RAM) and plenty resources

I followed tutorials to pull images from dockerhub to kubernetes

SERVICE_NAME=telperion DOCKER_SERVER="https://index.docker.io/v1/" DOCKER_USERNAME=username DOCKER_PASSWORD=password DOCKER_EMAIL="[email protected]"  # Create secret kubectl create secret docker-registry dockerhub --docker-server=$DOCKER_SERVER --docker-username=$DOCKER_USERNAME --docker-password=$DOCKER_PASSWORD --docker-email=$DOCKER_EMAIL  # Create service yaml echo "apiVersion: v1 \n\ kind: Pod \n\ metadata: \n\   name: ${SERVICE_NAME} \n\ spec: \n\   containers: \n\     - name: ${SERVICE_NAME} \n\       image: omg/${SERVICE_NAME} \n\       imagePullPolicy: Always \n\       command: [ \"echo\",\"done deploying $SERVICE_NAME\" ] \n\   imagePullSecrets: \n\     - name: dockerhub" > $SERVICE_NAME.yaml   # Deploy to kubernetes  kubectl create -f $SERVICE_NAME.yaml 

Which results in the pod going into a CrashLoopBackoff

docker run -it -p8080:9546 omg/telperion works fine.

So my question is Is this debug-able?, if so, how do i debug this?

Some logs:

kubectl get nodes                                                                                           NAME                    STATUS                     AGE       VERSION k8s-agent-adb12ed9-0    Ready                      22h       v1.6.6 k8s-agent-adb12ed9-1    Ready                      22h       v1.6.6 k8s-agent-adb12ed9-2    Ready                      22h       v1.6.6 k8s-master-adb12ed9-0   Ready,SchedulingDisabled   22h       v1.6.6 

.

kubectl get pods                                                                                                NAME                        READY     STATUS             RESTARTS   AGE telperion                    0/1       CrashLoopBackOff   10         28m 

.

kubectl describe pod telperion Name:           telperion Namespace:      default Node:           k8s-agent-adb12ed9-2/10.240.0.4 Start Time:     Wed, 21 Jun 2017 10:18:23 +0000 Labels:         <none> Annotations:    <none> Status:         Running IP:             10.244.1.4 Controllers:    <none> Containers:   telperion:     Container ID:       docker://c2dd021b3d619d1d4e2afafd7a71070e1e43132563fdc370e75008c0b876d567     Image:              omg/telperion     Image ID:           docker-pullable://omg/telperion@sha256:c7e3beb0457b33cd2043c62ea7b11ae44a5629a5279a88c086ff4853828a6d96     Port:     Command:       echo       done deploying telperion     State:              Waiting       Reason:           CrashLoopBackOff     Last State:         Terminated       Reason:           Completed       Exit Code:        0       Started:          Wed, 21 Jun 2017 10:19:25 +0000       Finished:         Wed, 21 Jun 2017 10:19:25 +0000     Ready:              False     Restart Count:      3     Environment:        <none>     Mounts:       /var/run/secrets/kubernetes.io/serviceaccount from default-token-n7ll0 (ro) Conditions:   Type          Status   Initialized   True   Ready         False   PodScheduled  True Volumes:   default-token-n7ll0:     Type:       Secret (a volume populated by a Secret)     SecretName: default-token-n7ll0     Optional:   false QoS Class:      BestEffort Node-Selectors: <none> Tolerations:    <none> Events:   FirstSeen     LastSeen        Count   From                            SubObjectPath                                   Type            Reason          Message   ---------     --------        -----   ----                            -------------                                   --------        ------          -------   1m            1m              1       default-scheduler                                                               Normal          Scheduled       Successfully assigned telperion to k8s-agent-adb12ed9-2   1m            1m              1       kubelet, k8s-agent-adb12ed9-2   spec.containers{telperion}      Normal          Created         Created container with id d9aa21fd16b682698235e49adf80366f90d02628e7ed5d40a6e046aaaf7bf774   1m            1m              1       kubelet, k8s-agent-adb12ed9-2   spec.containers{telperion}      Normal          Started         Started container with id d9aa21fd16b682698235e49adf80366f90d02628e7ed5d40a6e046aaaf7bf774   1m            1m              1       kubelet, k8s-agent-adb12ed9-2   spec.containers{telperion}      Normal          Started         Started container with id c6c8f61016b06d0488e16bbac0c9285fed744b933112fd5d116e3e41c86db919   1m            1m              1       kubelet, k8s-agent-adb12ed9-2   spec.containers{telperion}      Normal          Created         Created container with id c6c8f61016b06d0488e16bbac0c9285fed744b933112fd5d116e3e41c86db919   1m            1m              2       kubelet, k8s-agent-adb12ed9-2                                                   Warning         FailedSync      Error syncing pod, skipping: failed to "StartContainer" for "telperion" with CrashLoopBackOff: "Back-off 10s restarting failed container=telperion pod=telperion_default(f4e36a12-566a-11e7-99a6-000d3aa32f49)"    1m    1m      1       kubelet, k8s-agent-adb12ed9-2   spec.containers{telperion}      Normal  Started         Started container with id 3b911f1273518b380bfcbc71c9b7b770826c0ce884ac876fdb208e7c952a4631   1m    1m      1       kubelet, k8s-agent-adb12ed9-2   spec.containers{telperion}      Normal  Created         Created container with id 3b911f1273518b380bfcbc71c9b7b770826c0ce884ac876fdb208e7c952a4631   1m    1m      2       kubelet, k8s-agent-adb12ed9-2                                                   Warning FailedSync      Error syncing pod, skipping: failed to "StartContainer" for "telperion" with CrashLoopBackOff: "Back-off 20s restarting failed container=telperion pod=telperion_default(f4e36a12-566a-11e7-99a6-000d3aa32f49)"    1m    50s     4       kubelet, k8s-agent-adb12ed9-2   spec.containers{telperion}      Normal  Pulling         pulling image "omg/telperion"   47s   47s     1       kubelet, k8s-agent-adb12ed9-2   spec.containers{telperion}      Normal  Started         Started container with id c2dd021b3d619d1d4e2afafd7a71070e1e43132563fdc370e75008c0b876d567   1m    47s     4       kubelet, k8s-agent-adb12ed9-2   spec.containers{telperion}      Normal  Pulled          Successfully pulled image "omg/telperion"   47s   47s     1       kubelet, k8s-agent-adb12ed9-2   spec.containers{telperion}      Normal  Created         Created container with id c2dd021b3d619d1d4e2afafd7a71070e1e43132563fdc370e75008c0b876d567   1m    9s      8       kubelet, k8s-agent-adb12ed9-2   spec.containers{telperion}      Warning BackOff         Back-off restarting failed container   46s   9s      4       kubelet, k8s-agent-adb12ed9-2                                                   Warning FailedSync      Error syncing pod, skipping: failed to "StartContainer" for "telperion" with CrashLoopBackOff: "Back-off 40s restarting failed container=telperion pod=telperion_default(f4e36a12-566a-11e7-99a6-000d3aa32f49)" 

Edit 1: Errors reported by kubelet on master:

journalctl -u kubelet 

.

Jun 21 10:28:49 k8s-master-ADB12ED9-0 docker[1622]: E0621 10:28:49.798140    1809 fsHandler.go:121] failed to collect filesystem stats - rootDiskErr: du command failed on /var/lib/docker/overlay/5cfff16d670f2df6520360595d7858fb5d16607b6999a88e5dcbc09e1e7ab9ce with output Jun 21 10:28:49 k8s-master-ADB12ED9-0 docker[1622]: , stderr: du: cannot access '/var/lib/docker/overlay/5cfff16d670f2df6520360595d7858fb5d16607b6999a88e5dcbc09e1e7ab9ce/merged/proc/13122/task/13122/fd/4': No such file or directory Jun 21 10:28:49 k8s-master-ADB12ED9-0 docker[1622]: du: cannot access '/var/lib/docker/overlay/5cfff16d670f2df6520360595d7858fb5d16607b6999a88e5dcbc09e1e7ab9ce/merged/proc/13122/task/13122/fdinfo/4': No such file or directory Jun 21 10:28:49 k8s-master-ADB12ED9-0 docker[1622]: du: cannot access '/var/lib/docker/overlay/5cfff16d670f2df6520360595d7858fb5d16607b6999a88e5dcbc09e1e7ab9ce/merged/proc/13122/fd/3': No such file or directory Jun 21 10:28:49 k8s-master-ADB12ED9-0 docker[1622]: du: cannot access '/var/lib/docker/overlay/5cfff16d670f2df6520360595d7858fb5d16607b6999a88e5dcbc09e1e7ab9ce/merged/proc/13122/fdinfo/3': No such file or directory Jun 21 10:28:49 k8s-master-ADB12ED9-0 docker[1622]:  - exit status 1, rootInodeErr: <nil>, extraDiskErr: <nil> 

Edit 2: more logs

kubectl logs $SERVICE_NAME -p                                                                                                     done deploying telperion 
like image 559
ixaxaar Avatar asked Jun 21 '17 10:06

ixaxaar


People also ask

How do I find out what caused CrashLoopBackOff?

Running the kubectl get pods, then the kubectl describe pod commands against your pods isolates the source of your CrashLoopBackOff status. When the output for these commands prints out, you'll know when the error starts, what new changes were made to the infrastructure, and if there's a new API authentication policy.

What does CrashLoopBackOff mean in Kubernetes?

CrashLoopBackOff means the pod has failed/exited unexpectedly/has an error code that is not zero. There are a couple of ways to check this. I would recommend to go through below links and get the logs for the pod using kubectl logs. Debug Pods and ReplicationControllers. Determine the Reason for Pod Failure.

How do I fix back restarting failed container?

If you receive the "Back-Off restarting failed container" output message, then your container probably exited soon after Kubernetes started the container. If the Liveness probe isn't returning a successful status, then verify that the Liveness probe is configured correctly for the application.


1 Answers

You can access the logs of your pods with

kubectl logs [podname] -p 

the -p option will read the logs of the previous (crashed) instance

If the crash comes from the application, you should have useful logs in there.

like image 106
Fabien Avatar answered Oct 13 '22 21:10

Fabien