I am working on a java service that basically creates files in a network file system to store data. It runs in a k8s cluster in a Ubuntu 18.04 LTS. When we began to limit the memory in kubernetes (limits: memory: 3Gi), the pods began to be OOMKilled by kubernetes.
At the beginning we thought it was a leak of memory in the java process, but analyzing more deeply we noticed that the problem is the memory of the kernel. We validated that looking at the file /sys/fs/cgroup/memory/memory.kmem.usage_in_bytes
We isolated the case to only create files (without java) with the DD command like this:
for i in {1..50000}; do dd if=/dev/urandom bs=4096 count=1 of=file$i; done
And with the dd command we saw that the same thing happened ( the kernel memory grew until OOM). After k8s restarted the pod, I got doing a describe pod:
Creating files cause the kernel memory grows, deleting those files cause the memory decreases . But our services store data , so it creates a lot of files continuously, until the pod is killed and restarted because OOMKilled.
We tested limiting the kernel memory using a stand alone docker with the --kernel-memory parameter and it worked as expected. The kernel memory grew to the limit and did not rise anymore. But we did not find any way to do that in a kubernetes cluster. Is there a way to limit the kernel memory in a K8S environment ? Why the creation of files causes the kernel memory grows and it is not released ?
OOMKilled is actually not native to Kubernetes—it is a feature of the Linux Kernel, known as the OOM Killer , which Kubernetes uses to manage container lifecycles. The OOM Killer mechanism monitors node memory and selects processes that are taking up too much memory, and should be killed.
Each node in your cluster must have at least 300 MiB of memory.
If a container attempts to exceed the specified limit, the system will throttle the container.
A restarting container can indicate problems with memory (see the Out of Memory section), cpu usage, or just an application exiting prematurely. If a container is being restarted because of CPU usage, try increasing the requested and limit amounts for CPU in the pod spec.
Thanks for all this info, it was very useful!
On my app, I solved this by creating a new side container that runs a cron job, every 5 minutes with the following command:
echo 3 > /proc/sys/vm/drop_caches
(note that you need the side container to run in privileged mode)
It works nicely and has the advantage of being predictable: every 5 minutes, your memory cache will be cleared.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With