Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Kubernetes Pod OOMKilled Solution

I have a service running on Kubernetes processing files passed from another resource. Single file size can vary between 10MB - 1GB.

Recently I've been seeing the pod dead because of OOMKilled Error:

State: Running
Started: Sun, 11 Nov 2018 07:28:46 +0000
Last State: Terminated
Reason: OOMKilled
Exit Code: 137
Started: Fri, 09 Nov 2018 18:49:46 +0000
Finished: Sun, 11 Nov 2018 07:28:45 +0000

I mitigate the issue by bumping the resource (Memory) limit on the pod. But I am concerning whenever there is a traffic or file size spike, we will run into this OOMKilled issue again. But if I set the memory limit too high, I am concerning it will cause trouble on the host of this pod.

I read through the best practices given by Kubernetes: https://kubernetes.io/docs/tasks/administer-cluster/out-of-resource/#best-practices. But I am not sure by adding --eviction-hard and --system-reserved=memory could resolve the issue.

Has anyone had experience with a similar issue before?

Any help would be appreciated.

like image 942
Edward Avatar asked Nov 12 '18 23:11

Edward


People also ask

How do you resolve memory issue in Kubernetes?

If an application has a memory leak or tries to use more memory than a set limit amount, Kubernetes will terminate it with an “OOMKilled—Container limit reached” event and Exit Code 137. When you see a message like this, you have two choices: increase the limit for the pod or start debugging.

What happens when POD is memory limit?

If the Container continues to consume memory beyond its limit, the Container is terminated. If a terminated Container can be restarted, the kubelet restarts it, as with any other type of runtime failure.


1 Answers

More than a Kubernetes/Container runtime issue this is more memory management in your application and this will vary depending on what language runtime or if something like the JVM is running your application.

You generally want to set an upper limit on the memory usage in the application, for example, a maximum heap space in your JVM, then leave a little headroom for garbage collection and overruns.

Another example is the Go runtime and looks like they have talked about memory management but with no solution as of this writing. For these cases, it might be good to manually set the ulimit the virtual memory for the specific process of your application. (If you have a leak you will see other types of errors) or using timeout

There's also manual cgroup management but then again that's exactly with docker and Kubernetes are supposed to do.

This is a good article with some insights about managing a JVM in containers.

like image 94
Rico Avatar answered Sep 27 '22 18:09

Rico