Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Kubernetes, simple SpringBoot app OOMKilled

I'm working with OpenJDK 11 and a very simple SpringBoot application that almost the only thing it has is the SpringBoot actuator enabled so I can call /actuator/health etc.

I also have a kubernetes cluster on GCE very simple with just a pod with a container (containing this app of course)

My configuration has some key points that I want to highlight, it has some requirements and limits

resources:
  limits:
    memory: 600Mi
  requests:
    memory: 128Mi

And it has a readiness probe

readinessProbe:
  initialDelaySeconds: 30
  periodSeconds: 30
  httpGet:
    path: /actuator/health
    port: 8080

I'm also setting a JVM_OPTS like (that my program is using obviously)

env:
- name: JVM_OPTS
  value: "-XX:MaxRAM=512m"

The problem

I launch this and it gets OOMKilled in about 3 hours every time!

I'm never calling anything myself the only call is the readiness probe each 30 seconds that kubernetes does, and that is enough to exhaust the memory ? I have also not implemented anything out of the ordinary, just a Get method that says hello world along all the SpringBoot imports to have the actuators

If I run kubectl top pod XXXXXX I actually see how gradually get bigger and bigger

I have tried a lot of different configurations, tips, etc, but anything seems to work with a basic SpringBoot app

Is there a way to actually hard limit the memory in a way that Java can raise a OutOfMemory exception ? or to prevent this from happening?

Thanks in advance


EDIT: After 15h running

NAME                        READY   STATUS    RESTARTS   AGE
pod/test-79fd5c5b59-56654   1/1     Running   4          15h

describe pod says...

State:          Running
  Started:      Wed, 27 Feb 2019 10:29:09 +0000
Last State:     Terminated
  Reason:       OOMKilled
  Exit Code:    137
  Started:      Wed, 27 Feb 2019 06:27:39 +0000
  Finished:     Wed, 27 Feb 2019 10:29:08 +0000

That last span of time is about 4 hours and only have 483 calls to /actuator/health, apparently that was enough to make java exceed the MaxRAM hint ?


EDIT: Almost 17h

its about to die again

$ kubectl top pod test-79fd5c5b59-56654

NAME                    CPU(cores)   MEMORY(bytes)   
test-79fd5c5b59-56654   43m          575Mi      

EDIT: loosing any hope at 23h

NAME                        READY   STATUS    RESTARTS   AGE
pod/test-79fd5c5b59-56654   1/1     Running   6          23h

describe pod:

State:          Running
      Started:      Wed, 27 Feb 2019 18:01:45 +0000
    Last State:     Terminated
      Reason:       OOMKilled
      Exit Code:    137
      Started:      Wed, 27 Feb 2019 14:12:09 +0000
      Finished:     Wed, 27 Feb 2019 18:01:44 +0000

EDIT: A new finding

Yesterday night I was doing some interesting reading:

https://developers.redhat.com/blog/2017/03/14/java-inside-docker/ https://banzaicloud.com/blog/java10-container-sizing/ https://medium.com/adorsys/jvm-memory-settings-in-a-container-environment-64b0840e1d9e

TL;DR I decided to remove the memory limit and start the process again, the result was quite interesting (after like 11 hours running)

NAME                    CPU(cores)   MEMORY(bytes)   
test-84ff9d9bd9-77xmh   218m         1122Mi  

So... WTH with that CPU? I kind expecting a big number on memory usage but what happens with the CPU?

The one thing I can think is that the GC is running as crazy thinking that the MaxRAM is 512m and he is using more than 1G. I'm wondering, is Java detecting ergonomics correctly? (I'm starting to doubt it)

To test my theory I set a limit of 512m and deploy the app this way and I found that from the start there is a unusual CPU load that it has to be the GC running very frequently

kubectl create ...

limitrange/mem-limit-range created 
pod/test created

kubectl exec -it test-64ccb87fd7-5ltb6 /usr/bin/free
              total        used        free      shared  buff/cache   available
Mem:        7658200     1141412     4132708       19948     2384080     6202496
Swap:             0           0           0

kubectl top pod ..
NAME                    CPU(cores)   MEMORY(bytes)   
test-64ccb87fd7-5ltb6   522m         283Mi    

522m is too much vCPU, so my logical next step was to ensure I'm using the most appropriated GC for this case, I changed the JVM_OPTS this way:

  env:
  - name: JVM_OPTS
    value: "-XX:MaxRAM=512m -Xmx128m -XX:+UseSerialGC"
  ...
    resources:
      requests:
        memory: 256Mi
        cpu: 0.15
      limits:
        memory: 700Mi

And thats bring the vCPU usage to a reasonable status again, after kubectl top pod

NAME                    CPU(cores)   MEMORY(bytes)   
test-84f4c7445f-kzvd5   13m          305Mi 

Messing with Xmx having MaxRAM is obviously affecting the JVM but how is not possible to control the amount of memory we have on virtualized containers ? I know that free command will report the host available RAM but OpenJDK should be using cgroups rihgt?.

I'm still monitoring the memory ...


EDIT: A new hope

I did two things, the first one was to remove again my container limit, I want to analyze how much it will grow, also I added a new flag to see how the process is using the native memory -XX:NativeMemoryTracking=summary

At the beginning every thing was normal, the process started consuming like 300MB via kubectl top pod so I let it running about 4 hours and then ...

kubectl top pod

NAME                    CPU(cores)   MEMORY(bytes)
test-646864bc48-69wm2   54m          645Mi

kind of expected, right ? but then I checked the native memory usage

jcmd <PID> VM.native_memory summary

Native Memory Tracking:

Total: reserved=2780631KB, committed=536883KB
-                 Java Heap (reserved=131072KB, committed=120896KB)
                            (mmap: reserved=131072KB, committed=120896KB)

-                     Class (reserved=203583KB, committed=92263KB)
                            (classes #17086)
                            (  instance classes #15957, array classes #1129)
                            (malloc=2879KB #44797)
                            (mmap: reserved=200704KB, committed=89384KB)
                            (  Metadata:   )
                            (    reserved=77824KB, committed=77480KB)
                            (    used=76069KB)
                            (    free=1411KB)
                            (    waste=0KB =0.00%)
                            (  Class space:)
                            (    reserved=122880KB, committed=11904KB)
                            (    used=10967KB)
                            (    free=937KB)
                            (    waste=0KB =0.00%)

-                    Thread (reserved=2126472KB, committed=222584KB)
                            (thread #2059)
                            (stack: reserved=2116644KB, committed=212756KB)
                            (malloc=7415KB #10299)
                            (arena=2413KB #4116)

-                      Code (reserved=249957KB, committed=31621KB)
                            (malloc=2269KB #9949)
                            (mmap: reserved=247688KB, committed=29352KB)

-                        GC (reserved=951KB, committed=923KB)
                            (malloc=519KB #1742)
                            (mmap: reserved=432KB, committed=404KB)

-                  Compiler (reserved=1913KB, committed=1913KB)
                            (malloc=1783KB #1343)
                            (arena=131KB #5)

-                  Internal (reserved=7798KB, committed=7798KB)
                            (malloc=7758KB #28415)
                            (mmap: reserved=40KB, committed=40KB)

-                     Other (reserved=32304KB, committed=32304KB)
                            (malloc=32304KB #3030)

-                    Symbol (reserved=20616KB, committed=20616KB)
                            (malloc=17475KB #212850)
                            (arena=3141KB #1)

-    Native Memory Tracking (reserved=5417KB, committed=5417KB)
                            (malloc=347KB #4494)
                            (tracking overhead=5070KB)

-               Arena Chunk (reserved=241KB, committed=241KB)
                            (malloc=241KB)

-                   Logging (reserved=4KB, committed=4KB)
                            (malloc=4KB #184)

-                 Arguments (reserved=17KB, committed=17KB)
                            (malloc=17KB #469)

-                    Module (reserved=286KB, committed=286KB)
                            (malloc=286KB #2704)

Wait, What ? 2.1 GB reserved for threads? and 222 MB being used, what is this ? I currently don't know, I just saw it...

I need time trying to understand why this is happening

like image 950
William Añez Avatar asked Feb 26 '19 19:02

William Añez


People also ask

Can we deploy spring boot application in Kubernetes?

Kubernetes is an open-source system for automating the deployment, scaling, and management of containerized applications. It groups containers that make up an application into logical units for easy management and discovery. In this guide, we build and deploy a simple Spring boot application.

What is exit code 137 in Kubernetes?

Exit code 137 is triggered when a pod or a container within your Kubernetes environment exceeds the amount of memory that they're assigned. Typically, this exit code is accompanied by or simply known as OOMKilled. Killed in this description refers to the result of this error, which causes a pod to terminate.


1 Answers

I finally found my issue and I want to share it so others can benefit in some way from this.

As I found on my last edit I had a thread problem that was causing all the memory consumption over time, specifically we was using an asynchronous method from a third party library without properly taking care those resources (ensure those calls was ending correctly in this case).

I was able to detect the issue because I used a memory limit on my kubernete deployment from the beginning (which is a good practice on production environments) and then I monitored very closely my app memory consumption using tools like jstat, jcmd, visualvm, kill -3 and most importantly the -XX:NativeMemoryTracking=summary flag that gave me so much detail in this regard.

like image 118
William Añez Avatar answered Sep 25 '22 08:09

William Añez