Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is the reason for Back-off restarting failed container for elasticsearch kubernetes pod?

When I try to run my elasticsearch container through kubernetes deployments, my elasticsearch pod fails after some time, While it runs perfectly fine when directly run as docker container using docker-compose or Dockerfile. This is what I get as a result of kubectl get pods

NAME                  READY     STATUS    RESTARTS   AGE
es-764bd45bb6-w4ckn   0/1       Error     4          3m

below is the result of kubectl describe pod

Name:           es-764bd45bb6-w4ckn
Namespace:      default
Node:           administrator-thinkpad-l480/<node_ip>
Start Time:     Thu, 30 Aug 2018 16:38:08 +0530
Labels:         io.kompose.service=es
            pod-template-hash=3206801662
Annotations:    <none> 
Status:         Running
IP:             10.32.0.8
Controlled By:  ReplicaSet/es-764bd45bb6
Containers:
es:
Container ID:   docker://9be2f7d6eb5d7793908852423716152b8cefa22ee2bb06fbbe69faee6f6aa3c3
Image:          docker.elastic.co/elasticsearch/elasticsearch:6.2.4
Image ID:       docker-pullable://docker.elastic.co/elasticsearch/elasticsearch@sha256:9ae20c753f18e27d1dd167b8675ba95de20b1f1ae5999aae5077fa2daf38919e
Port:           9200/TCP
State:          Waiting
  Reason:       CrashLoopBackOff
Last State:     Terminated
  Reason:       Error
  Exit Code:    78
  Started:      Thu, 30 Aug 2018 16:42:56 +0530
  Finished:     Thu, 30 Aug 2018 16:43:07 +0530
Ready:          False
Restart Count:  5
Environment:
  ELASTICSEARCH_ADVERTISED_HOST_NAME:  es
  ES_JAVA_OPTS:                        -Xms2g -Xmx2g
  ES_HEAP_SIZE:                        2GB
Mounts:
  /var/run/secrets/kubernetes.io/serviceaccount from default-token-nhb9z (ro)
Conditions:
  Type              Status
  Initialized       True 
  Ready             False 
  ContainersReady   False 
  PodScheduled      True 
Volumes:
  default-token-nhb9z:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  default-token-nhb9z
    Optional:    false
QoS Class:       BestEffort
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/not-ready:NoExecute for 300s
             node.kubernetes.io/unreachable:NoExecute for 300s
Events:
  Type     Reason     Age               From           Message
  ----     ------     ----              ----           -------
 Normal   Scheduled  6m                default-scheduler                     Successfully assigned default/es-764bd45bb6-w4ckn to administrator-thinkpad-l480
 Normal   Pulled     3m (x5 over 6m)   kubelet, administrator-thinkpad-l480  Container image "docker.elastic.co/elasticsearch/elasticsearch:6.2.4" already present on machine
 Normal   Created    3m (x5 over 6m)   kubelet, administrator-thinkpad-l480  Created container
 Normal   Started    3m (x5 over 6m)   kubelet, administrator-thinkpad-l480  Started container
 Warning  BackOff    1m (x15 over 5m)  kubelet, administrator-thinkpad-l480  Back-off restarting failed container

Here is my elasticsearc-deployment.yaml:

apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  annotations:
    kompose.cmd: kompose convert
    kompose.version: 1.1.0 (36652f6)
  creationTimestamp: null
  labels:
    io.kompose.service: es
  name: es
spec:
  replicas: 1
  strategy: {}
  template:
    metadata:
      creationTimestamp: null
      labels:
        io.kompose.service: es
    spec:
      containers:
      - env:
        - name: ELASTICSEARCH_ADVERTISED_HOST_NAME
          value: es
        - name: ES_JAVA_OPTS
          value: -Xms2g -Xmx2g
        - name: ES_HEAP_SIZE
          value: 2GB
        image: docker.elastic.co/elasticsearch/elasticsearch:6.2.4
        name: es
        ports:
        - containerPort: 9200
        resources: {}
      restartPolicy: Always
 status: {}

When i try to get logs using kubectl logs -f es-764bd45bb6-w4ckn, I get

Error from server: Get https://<slave node ip>:10250/containerLogs/default/es-764bd45bb6-w4ckn/es?previous=true: dial tcp <slave node ip>:10250: i/o timeout 

What could be the reason and solution for this problem ?

like image 717
Lakshya Garg Avatar asked Aug 30 '18 11:08

Lakshya Garg


People also ask

What causes back off restarting failed container?

1. Check for “Back Off Restarting Failed Container” Run kubectl describe pod [name] . If you get a Liveness probe failed and Back-off restarting failed container messages from the kubelet, as shown below, this indicates the container is not responding and is in the process of restarting.

What causes a Kubernetes pod to restart?

When a container is out of memory, or OOM, it is restarted by its pod according to the restart policy. The default restart policy will eventually back off on restarting the pod if it restarts many times in a short time span.

What happens when a Kubernetes pod fails?

If a Pod is scheduled to a node that then fails, the Pod is deleted; likewise, a Pod won't survive an eviction due to a lack of resources or Node maintenance.


2 Answers

I had the same problem, there can be couple of reasons for this issue. In my case the jar file was missing. @Lakshya has already answered this problem, I would like to add the steps that you can take to troubleshoot it.

  1. Get the pod status, Command - kubectl get pods
  2. Describe pod to have further look - kubectl describe pod "pod-name" The last few lines of output gives you events and where your deployment failed
  3. Get logs for more details - kubectl logs "pod-name"
  4. Get container logs - kubectl logs "pod-name" -c "container-name" Get the container name from the output of describe pod command

If your container is up, you can use the kubectl exec -it command to further analyse the container

Hope it helps community members in future issues.

like image 151
Pradeep Avatar answered Sep 28 '22 05:09

Pradeep


I found the logs using docker logs for the es container and found that es was not starting because of the vm.max_map_count set to very low value. I changed the vm.max_map_count to desired value using sysctl -w vm.max_map_count=262144 and the pod has started after that.

like image 33
Lakshya Garg Avatar answered Sep 28 '22 05:09

Lakshya Garg