K8S Pod with startupProbe and initialDelaySeconds specified waits too long to become Ready

Tags:

I have been trying to debug a very odd delay in my K8S deployments. I have tracked it down to the simple reproduction below. What it appears is that if I set an initialDelaySeconds on a startup probe or leave it 0 and have a single failure, then the probe doesn't get run again for a while and ends up with atleast a 1-1.5 minute delay getting into Ready:true state.

I am running locally with Ubutunu 18.04 and microk8s v1.19.3 with the following versions:

kubelet: v1.19.3-34+a56971609ff35a
kube-proxy: v1.19.3-34+a56971609ff35a
containerd://1.3.7

apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app: microbot
  name: microbot
spec:
  replicas: 1
  selector:
    matchLabels:
      app: microbot
  strategy: {}
  template:
    metadata:
      labels:
        app: microbot
    spec:
      containers:
      - image: cdkbot/microbot-amd64
        name: microbot
        command: ["/bin/sh"]
        args: ["-c", "sleep 3; /start_nginx.sh"]
        #args: ["-c", "/start_nginx.sh"]
        ports:
        - containerPort: 80
        startupProbe:
          httpGet:
            path: /
            port: 80
          initialDelaySeconds: 0  # 5 also has same issue
          periodSeconds: 1
          failureThreshold: 10
          successThreshold: 1
        ##livenessProbe:
        ##  httpGet:
        ##    path: /
        ##    port: 80
        ##  initialDelaySeconds: 0
        ##  periodSeconds: 10
        ##  failureThreshold: 1
        resources: {}
      restartPolicy: Always
      serviceAccountName: ""
status: {}
---
apiVersion: v1
kind: Service
metadata:
  name: microbot
  labels:
    app: microbot
spec:
  ports:
    - port: 80
      protocol: TCP
      targetPort: 80
  selector:
    app: microbot

The issue is that if I have any delay in the startupProbe or if there is an initial failure, the pod gets into Initialized:true state but had Ready:False and ContainersReady:False. It will not change from this state for 1-1.5 minutes. I haven't found a pattern to the settings.

I left in the comment out settings as well so you can see what I am trying to get to here. What I have is a container starting up that has a service that will take a few seconds to get started. I want to tell the startupProbe to wait a little bit and then check every second to see if we are ready to go. The configuration seems to work, but there is a baked in delay that I can't track down. Even after the startup probe is passing, it does not transition the pod to Ready for more than a minute.

Is there some setting elsewhere in k8s that is delaying the amount of time before a Pod can move into Ready if it isn't Ready initially?

Any ideas are greatly appreciated.

745

asked Dec 02 '20 17:12

Allen

1 Answers

Actually I made a mistake in comments, you can use initialDelaySeconds in startupProbe, but you should rather use failureThreshold and periodSeconds instead.

As mentioned here

Kubernetes Probes

Kubernetes supports readiness and liveness probes for versions ≤ 1.15. Startup probes were added in 1.16 as an alpha feature and graduated to beta in 1.18 (WARNING: 1.16 deprecated several Kubernetes APIs. Use this migration guide to check for compatibility). All the probe have the following parameters:

initialDelaySeconds : number of seconds to wait before initiating liveness or readiness probes

periodSeconds: how often to check the probe

timeoutSeconds: number of seconds before marking the probe as timing out (failing the health check)

successThreshold : minimum number of consecutive successful checks for the probe to pass

failureThreshold : number of retries before marking the probe as failed. For liveness probes, this will lead to the pod restarting. For readiness probes, this will mark the pod as unready.

So why should you use failureThreshold and periodSeconds?

consider an application where it occasionally needs to download large amounts of data or do an expensive operation at the start of the process. Since initialDelaySeconds is a static number, we are forced to always take the worst-case scenario (or extend the failureThreshold that may affect long-running behavior) and wait for a long time even when that application does not need to carry out long-running initialization steps. With startup probes, we can instead configure failureThreshold and periodSeconds to model this uncertainty better. For example, setting failureThreshold to 15 and periodSeconds to 5 means the application will get 15 (fifteen) x 5 (five) = 75s to startup before it fails.

Additionally if you need more informations take a look at this article on medium.

Quoted from kubernetes documentation about Protect slow starting containers with startup probes

Sometimes, you have to deal with legacy applications that might require an additional startup time on their first initialization. In such cases, it can be tricky to set up liveness probe parameters without compromising the fast response to deadlocks that motivated such a probe. The trick is to set up a startup probe with the same command, HTTP or TCP check, with a failureThreshold * periodSeconds long enough to cover the worse case startup time.

So, the previous example would become:

ports:
- name: liveness-port
  containerPort: 8080
  hostPort: 8080

livenessProbe:
  httpGet:
    path: /healthz
    port: liveness-port
  failureThreshold: 1
  periodSeconds: 10

startupProbe:
  httpGet:
    path: /healthz
    port: liveness-port
  failureThreshold: 30
  periodSeconds: 10

Thanks to the startup probe, the application will have a maximum of 5 minutes (30 * 10 = 300s) to finish its startup. Once the startup probe has succeeded once, the liveness probe takes over to provide a fast response to container deadlocks. If the startup probe never succeeds, the container is killed after 300s and subject to the pod's restartPolicy.

answered Sep 17 '22 00:09

Jakub

Related questions
                            
                                How do I set up a Kubernetes Ingress rule with a regex path?
                            
                                Yaml templates in Kubernetes
                            
                                kubernetes helm: "lost connection to pod" and "transport is closing" errors
                            
                                do we really need port for a headless service?
                            
                                How to secure the read-only port 10255 in Google Kubernetes Engine (GKE)?
                            
                                Kubernetes service deploying in default namespace instead of defined namespace using Helm
                            
                                Changing hostname breaks Rabbitmq when running on Kubernetes
                            
                                Accessing Postgres RDS from Kubernetes cluster in AWS
                            
                                Profiling Java application in kubernetes
                            
                                Is it possible to health check a Kubernetes API server over HTTP or TCP?
                            
                                List all controllers running in Kubernetes
                            
                                Kubernetes: PersistentVolumeClaim error, Forbidden: is immutable after creation except resources.requests for bound claims
                            
                                kubectl exec fails "cannot validate certificate because it doesn't contain any IP SANs"
                            
                                How do I control a kubernetes PersistentVolumeClaim to bind to a specific PersistentVolume?
                            
                                Can we install Kubernetes in a complete offline mode with kubeadm?
                            
                                How is the preemption notice handled?
                            
                                ServiceMonitor not found in monitoring.coreos.com/v1
                            
                                Kubernetes: Populate certificates into keystores
                            
                                Are Kubernete's ConfigMaps Writable?
                            
                                Limit access to Kubernetes secret by RBAC

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

K8S Pod with startupProbe and initialDelaySeconds specified waits too long to become Ready

Tags:

kubernetes

kubernetes-pod

kubernetes-deployment

microk8s

Allen

People also ask

1 Answers

Kubernetes Probes

Jakub

Recent Activity

Donate For Us