I try to get a general understanding of OOMKilled events and I've found 2 different reasons:
Pod memory limit exceeded: If the Container continues to consume memory beyond its limit, the Container is terminated.
Node out of memory: If the kubelet is unable to reclaim memory prior to a node experiencing system OOM, ... then kills the container ...
Questions
If the Container continues to consume memory beyond its limit, the Container is terminated. If a terminated Container can be restarted, the kubelet restarts it, as with any other type of runtime failure.
A restarting container can indicate problems with memory (see the Out of Memory section), cpu usage, or just an application exiting prematurely. If a container is being restarted because of CPU usage, try increasing the requested and limit amounts for CPU in the pod spec.
However there are several reasons for POD failure, some of them are the following: Wrong image used for POD. Wrong command/arguments are passed to the POD. Kubelet failed to check POD liveliness(i.e., liveliness probe failed).
If a container attempts to exceed the specified limit, the system will throttle the container.
The OOMKilled error, also indicated by exit code 137, means that a container or pod was terminated because they used more memory than allowed. OOM stands for “Out Of Memory”. Kubernetes allows pods to limit the resources their containers are allowed to utilize on the host machine.
Well, it’s complicated. Kubernetes will not allocate pods that sum to more memory requested than memory available in a node. But limits can be higher than requests, so the sum of all limits can be higher than node capacity. This is called overcommit and it is very common.
OOM stands for “Out Of Memory”. Kubernetes allows pods to limit the resources their containers are allowed to utilize on the host machine. A pod can specify a memory limit – the maximum amount of memory the container is allowed to use, and a memory request – the minimum memory the container is expected to use.
This frees memory to relieve the memory pressure. This is by far the most simple memory error you can have in a pod. You set a memory limit, one container tries to allocate more memory than that allowed,and it gets an error. This usually ends up with a container dying, one pod unhealthy and Kubernetes restarting that pod.
This is related to kubernetes QoS.
TLDR: - There are 3 different classes:
BestEffort: Pod with no resources defined, is the first to get killed when the node runs out of resources.
Burstable: When you set resource requests and limit to different values, which the limit - request is assured but if it needs to "burst" it will be shared with other objects and depends on how much resources at used at that point, not guaranteed.
Guaranteed: When you set the resource requests and limits to the same values, in that case the resources will be assured to the pod. In case nodes get short of resources will be the last to be killed.
Both problems result in different error states:
1: An exceeded pod memory limit causes a OOMKilled
termination
2: Node out of memory causes a MemoryPressure
and and pod eviction.
kubectl describe pod mypod-xxxx
...
Reason: Evicted
Message: Pod The node had condition: [MemoryPressure].
...
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With