Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I elegantly and safely maximize the amount of heap space allocated to a Java application in Kubernetes?

I have a Kubernetes deployment that deploys a Java application based on the anapsix/alpine-java image. There is nothing else running in the container expect for the Java application and the container overhead.

I want to maximise the amount of memory the Java process can use inside the docker container and minimise the amount of ram that will be reserved but never used.

For example I have:

  1. Two Kubernetes nodes that have 8 gig of ram each and no swap
  2. A Kubernetes deployment that runs a Java process consuming a maximum of 1 gig of heap to operate optimally

How can I safely maximise the amount of pods running on the two nodes while never having Kubernetes terminate my PODs because of memory limits?

apiVersion: apps/v1
kind: Deployment
metadata:
  name: my-deployment
spec:
  replicas: 1
  template:
    metadata:
      labels:
    app: my-deployment
    spec:
      containers:
      - name: my-deployment
    image: myreg:5000/my-deployment:0.0.1-SNAPSHOT
    ports:
    - containerPort: 8080
      name: http
    resources:
      requests:
        memory: 1024Mi
      limits:
        memory: 1024Mi

Java 8 update 131+ has a flag -XX:+UseCGroupMemoryLimitForHeap to use the Docker limits that come from the Kubernetes deployment.

My Docker experiments show me what is happening in Kubernetes

If I run the following in Docker:

docker run -m 1024m anapsix/alpine-java:8_server-jre_unlimited java -XX:+UnlockExperimentalVMOptions -XX:+UseCGroupMemoryLimitForHeap -XshowSettings:vm -version

I get:

VM settings:
Max. Heap Size (Estimated): 228.00M

This low value is because Java sets -XX:MaxRAMFraction to 4 by default and I get about 1/4 of the ram allocated...

If I run the same command with -XX:MaxRAMFraction=2 in Docker:

docker run -m 1024m anapsix/alpine-java:8_server-jre_unlimited java -XX:+UnlockExperimentalVMOptions -XX:+UseCGroupMemoryLimitForHeap -XshowSettings:vm -XX:MaxRAMFraction=2 -version

I get:

VM settings:
Max. Heap Size (Estimated): 455.50M

Finally setting MaxRAMFraction=1 quickly causes Kubernetes to Kill my container.

docker run -m 1024m anapsix/alpine-java:8_server-jre_unlimited java -XX:+UnlockExperimentalVMOptions -XX:+UseCGroupMemoryLimitForHeap -XshowSettings:vm -XX:MaxRAMFraction=1 -version

I get:

VM settings:
Max. Heap Size (Estimated): 910.50M
like image 232
rjdkolb Avatar asked Sep 14 '17 05:09

rjdkolb


2 Answers

Important concepts

  • The memory request is mainly used during (Kubernetes) Pod scheduling.
  • The memory limit defines a memory limit for that cgroup.

According to the article Containerize your Java applications the best way to configure your JVM is to use the following JVM args:

-XX:+UseContainerSupport -XX:MaxRAMPercentage=75.0

Note, there is a bug where you need to specify 75.0 and not 75

To simulate what happens in Kubernetes with limits in the Linux container run:

docker run --memory="300m" openjdk:17-jdk-bullseye java -XX:+UseContainerSupport -XX:MinRAMPercentage=50.0 -XX:MaxRAMPercentage=75.0 -XshowSettings:vm -version

result:

VM settings:
    Max. Heap Size (Estimated): 218.50M
    Using VM: OpenJDK 64-Bit Server VM

It also works on old school Java 8:

docker run --memory="300m" openjdk:8-jdk-bullseye java -XX:+UseContainerSupport -XX:MinRAMPercentage=50.0 -XX:MaxRAMPercentage=75.0 -XshowSettings:vm -version

This way the container will read your requests from the cgroups (cgroups v1 or cgroups v2). Having a limit is extremely important to prevent evictions and noisy neighbours. I personally set the limit 10% over the request.

Older versions of the Java like Java 8 don't read the cgroups v2 and Docker desktop uses cgroups v2. To force Docker Desktop to use legacy cgroups1 set {"deprecatedCgroupv1": true} in ~/Library/Group\ Containers/group.com.docker/settings.json

like image 32
rjdkolb Avatar answered Sep 18 '22 14:09

rjdkolb


The reason Kubernetes kills your pods is the resource limit. It is difficult to calculate because of container overhead and the usual mismatches between decimal and binary prefixes in specification of memory usage. My solution is to entirely drop the limit and only keep the requirement(which is what your pod will have available in any case if it is scheduled). Rely on the JVM to limit its heap via static specification and let Kubernetes manage how many pods are scheduled on a single node via resource requirement.

At first you will need to determine the actual memory usage of your container when running with your desired heap size. Run a pod with -Xmx1024m -Xms1024m and connect to the hosts docker daemon it's scheduled on. Run docker ps to find your pod and docker stats <container> to see its current memory usage wich is the sum of JVM heap, other static JVM usage like direct memory and your containers overhead(alpine with glibc). This value should only fluctuate within kibibytes because of some network usage that is handled outside the JVM. Add this value as memory requirement to your pod template.

Calculate or estimate how much memory other components on your nodes need to function properly. There will at least be the Kubernetes kubelet, the Linux kernel, its userland, probably an SSH daemon and in your case a docker daemon running on them. You can choose a generous default like 1 Gibibyte excluding the kubelet if you can spare the extra few bytes. Specify --system-reserved=1Gi and --kube-reserved=100Mi in your kubelets flags and restart it. This will add those reserved resources to the Kubernetes schedulers calculations when determining how many pods can run on a node. See the official Kubernetes documentation for more information.

This way there will probably be five to seven pods scheduled on a node with eight Gigabytes of RAM, depending on the above chosen and measured values. They will be guaranteed the RAM specified in the memory requirement and will not be terminated. Verify the memory usage via kubectl describe node under Allocated resources. As for elegancy/flexibility, you just need to adjust the memory requirement and JVM heap size if you want to increase RAM available to your application.

This approach only works assuming that the pods memory usage will not explode, if it would not be limited by the JVM a rouge pod might cause eviction, see out of resource handling.

like image 80
Simon Tesar Avatar answered Sep 18 '22 14:09

Simon Tesar