Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Reasons for OOMKilled in kubernetes

Tags:

kubernetes

I try to get a general understanding of OOMKilled events and I've found 2 different reasons:

  1. Pod memory limit exceeded: If the Container continues to consume memory beyond its limit, the Container is terminated.

  2. Node out of memory: If the kubelet is unable to reclaim memory prior to a node experiencing system OOM, ... then kills the container ...

Questions

  • Is this correct?
  • Are there any other reasons?
  • Is it possible to see which reason caused the OOMKilled? (It's important to know the reason, because the remedy will be different.)
like image 643
Matthias M Avatar asked Jun 22 '20 16:06

Matthias M


People also ask

What happens when POD is memory limit?

If the Container continues to consume memory beyond its limit, the Container is terminated. If a terminated Container can be restarted, the kubelet restarts it, as with any other type of runtime failure.

Why is Kubernetes pod restarting?

A restarting container can indicate problems with memory (see the Out of Memory section), cpu usage, or just an application exiting prematurely. If a container is being restarted because of CPU usage, try increasing the requested and limit amounts for CPU in the pod spec.

Why do Pods fail in Kubernetes?

However there are several reasons for POD failure, some of them are the following: Wrong image used for POD. Wrong command/arguments are passed to the POD. Kubelet failed to check POD liveliness(i.e., liveliness probe failed).

What happens when pod hits CPU limit?

If a container attempts to exceed the specified limit, the system will throttle the container.

What is oomkilled in Kubernetes?

The OOMKilled error, also indicated by exit code 137, means that a container or pod was terminated because they used more memory than allowed. OOM stands for “Out Of Memory”. Kubernetes allows pods to limit the resources their containers are allowed to utilize on the host machine.

What is Kubernetes overcommit and how to avoid it?

Well, it’s complicated. Kubernetes will not allocate pods that sum to more memory requested than memory available in a node. But limits can be higher than requests, so the sum of all limits can be higher than node capacity. This is called overcommit and it is very common.

What is oom in Kubernetes?

OOM stands for “Out Of Memory”. Kubernetes allows pods to limit the resources their containers are allowed to utilize on the host machine. A pod can specify a memory limit – the maximum amount of memory the container is allowed to use, and a memory request – the minimum memory the container is expected to use.

Why does my Kubernetes pod keep dying?

This frees memory to relieve the memory pressure. This is by far the most simple memory error you can have in a pod. You set a memory limit, one container tries to allocate more memory than that allowed,and it gets an error. This usually ends up with a container dying, one pod unhealthy and Kubernetes restarting that pod.


2 Answers

This is related to kubernetes QoS.

TLDR: - There are 3 different classes:

BestEffort: Pod with no resources defined, is the first to get killed when the node runs out of resources.

Burstable: When you set resource requests and limit to different values, which the limit - request is assured but if it needs to "burst" it will be shared with other objects and depends on how much resources at used at that point, not guaranteed.

Guaranteed: When you set the resource requests and limits to the same values, in that case the resources will be assured to the pod. In case nodes get short of resources will be the last to be killed.

like image 188
paltaa Avatar answered Nov 20 '22 08:11

paltaa


Both problems result in different error states:

1: An exceeded pod memory limit causes a OOMKilled termination

2: Node out of memory causes a MemoryPressure and and pod eviction.

kubectl describe pod mypod-xxxx

...
Reason:         Evicted
Message:        Pod The node had condition: [MemoryPressure].
...
like image 40
Matthias M Avatar answered Nov 20 '22 09:11

Matthias M