Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Appropriate Kubernetes Readiness and Liveness Probes for Kestrel .NET Core Web API

Our aim is to horizontally scale a .NET Core 2.0 Web API using Kubernetes. The Web API application will be served by Kestrel.

It looks like we can gracefully handle the termination of pods by configuring Kestrel's shutdown timeout so now we are looking into how to probe the application to determine readiness and liveness.

Would it be enough to simply probe the Web API with a HTTP request? If so, would it be a good idea to create a new healthcheck controller to handle these probing requests or would it make more sense to probe an actual endpoint that would be consumed in normal use?

What should we consider when differentiating between the liveness and readiness probes?

like image 979
Alasdair Stark Avatar asked Dec 06 '17 07:12

Alasdair Stark


People also ask

What are the types of probes in Kubernetes that used for liveness and readiness check?

There are three types of probes: HTTP, Command, and TCP. You can use any of them for liveness and readiness checks.

What are the two types of health checks in Kubernetes?

Kubernetes gives you two types of health checks performed by the kubelet. They are: Startup Probe. Liveness Probe.

How do you check readiness probe in Kubernetes?

There is no separate endpoint for readiness probes, but we can access events using the kubectl describe pods <POD_NAME> command, for example, to get the current status. Use kubectl get pods command to see the pods' status. Pods and their status and ready states will be displayed, our pod is running as expected.

What happens if you don't specify a liveness probe?

What if I don't specify a liveness probe? If you don't specify a liveness probe, then OpenShift will decide whether to restart your container based on the status of the container's PID 1 process. The PID 1 process is the parent process of all other processes that run inside the container.


2 Answers

I would recommend to perform health checks through separate endpoints. In general, there are a number of good reasons for doing so, like:

  1. Checking that the application is live/ready or, more in general, in a healthy status is not necessarily the same as sending a user request to your web service. When performing health checks you should define what makes your web service healthy: this could be e.g. checking access to external resources, like database.
  2. It is easier to control who can actually perform health checks through your endpoints.
  3. More in general, you do not want to mess up with the actual service functionalities: you would otherwise need to re-think the way you do health checks when maintaining your service's functionalities. E.g. if your service interacts with a database, in a health checks context you want to verify the connection to the database is fine, but you do not actually care much about the data being manipulated internally by your service.
  4. Things get even more complicated if your web service is not stateless: in such case, you will need to make sure data remain consistent independently from your health checks.

As you pointed out, a good way to avoid any of the above could be setting up a separate Controller to handle health checks.

As an alternative option, there is a standard library available in ASP.NET Core for enabling Health Checks on your web service: at the time of writing this answer, it is not officially part of ASP.NET Core and no NuGet packages are available yet, but there is a plan for this to happen on future releases. For now, you can easily pull the code from the Official Repository and include it in your solution as explained in the Microsoft documentation. This is currently planned to be included in ASP.NET Core 2.2 as described in the ASP.NET Core 2.2 Roadmap.

I personally find it very elegant, as you will configure everything through the Startup.cs and Program.cs and won't need to explicitly create a new endpoint as the library already handles that for you.

I have been using it in a few projects and I would definitely recommend it. The repository includes an example specific for ASP.NET Core projects you can use to get quickly up to speed.

Liveness vs Readiness

In Kubernetes, you may then setup liveness and readiness probes through HTTP: as explained in the Kubernetes documentation, while the setup for both is almost identical, Kubernetes takes different actions depending on the probe:

Liveness probe from Kubernetes documentation:

Many applications running for long periods of time eventually transition to broken states, and cannot recover except by being restarted. Kubernetes provides liveness probes to detect and remedy such situations.

Readiness probe from Kubernetes documentation:

Sometimes, applications are temporarily unable to serve traffic. For example, an application might need to load large data or configuration files during startup. In such cases, you don’t want to kill the application, but you don’t want to send it requests either. Kubernetes provides readiness probes to detect and mitigate these situations. A pod with containers reporting that they are not ready does not receive traffic through Kubernetes Services.

So, while an unhealthy response to a liveness probe will cause the Pod (and so, the application) to be killed, an unhealthy response to a readiness probe will simply cause the Pod to receive no traffic until it gets back to a healthy status.

What to consider when differentiating liveness and readiness probes?

For liveness probe: I would recommend to define what makes your application healthy, i.e. minimum requirements for user consumption, and implement health checks based on that. This typically involves external resources or applications running as separate processes, e.g. databases, web services, etc. You may define health checks by using ASP.NET Core Health Checks library or manually with a separate Controller.

For readiness probe: You simply want to load your service to verify it actually responds in time and so allows Kubernetes to balance traffic accordingly. Trivially (and in most cases as suggested by Lukas in another answer), you may use the same exact endpoint you would use for liveness but setting up different timeouts, but this then really depends on your needs and requirements.

like image 156
smn.tino Avatar answered Oct 13 '22 10:10

smn.tino


What should we consider when differentiating between the liveness and readiness probes

My recommendation would be to provide a /health endpoint in your application separate from your application endpoint. This is useful if you want to block your consumers from calling your internal health endpoint. Then you can configure Kubernetes to query your HTTP /health endpoint like in the example below.

apiVersion: v1
kind: Pod
metadata:
  name: goproxy
spec:
  containers:
  - name: goproxy
    image: k8s.gcr.io/goproxy:0.1
    ports:
    - name: http
      containerPort: 8080
    readinessProbe:
      httpGet:
        port: http
        path: /health
      initialDelaySeconds: 60
    livenessProbe:
      httpGet:
        port: http
        path: /health

Inside your /health endpoint you should check the internal state of your application and return a status code of either 200 if everything is OK or 503 if your application is having issues. Keep in mind that health checks are performed usually every 15 seconds for every instance and if you are performing expensive operations to determining your application state you might slow down your application.

What should we consider when differentiating between the liveness and readiness probes

Usually the only difference between liveness and readiness probes are the timeouts in each probe. Maybe your application needs 60 seconds to start then you would need to set the initial timeout of your readiness probe to 60 while keeping the default liveness timeout.

like image 20
Lukas Eichler Avatar answered Oct 13 '22 12:10

Lukas Eichler