Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Ruby OOM in container

Tags:

docker

ruby

Recently we've encountered a problem with Ruby inside a Docker container. Despite quite low load, application tends to consume huge amounts of memory and after some time under mentioned load it OOMs.

After some investigation we narrowed down problem to the one-liner

docker run -ti -m 209715200 ruby:2.1 ruby -e 'while true do array = []; 3000000.times do array << "hey" end; puts array.length; end;'

On some machines it OOMed (was killed by oom-killer because of exceeding the limit) soon after the start, but on some it worked, though slowly, without OOMs. It seems like (only seems, maybe it's not the case) in some configurations ruby is able to deduce cgroup's limits and adjust it's GC.

Configurations tested:

  • CentOS 7, Docker 1.9 — OOM
  • CentOS 7, Docker 1.12 — OOM
  • Ubuntu 14.10, Docker 1.9 — OOM
  • Ubuntu 14.10, Docker 1.12 — OOM
  • MacOS X Docker 1.12 — No OOM
  • Fedora 23 Docker 1.12 — No OOM

If you look at the memory consumption of ruby process, in all cases it behaved similar to this picture, staying on the same level slightly below the limit, or crashing into the limit and being killed.

Memory consumption plot

We want to avoid OOMs at all cost, because it reduces resiliency and poses a risk of loosing data. Memory really needed for the application is way below the limit.

Do you have any suggestions as of what to do with ruby to avoid OOMing, possibly by loosing in performance?

We can't figure out what're the significant differences between tested installations.

Edit: Changing the code or increasing memory limit are not available. First one because we run fluentd with community plugins which we have no control of, second one because it won't guarantee that we won't face this issue again in the future.

like image 914
Mik Vyatskov Avatar asked Oct 26 '16 17:10

Mik Vyatskov


3 Answers

You can try to tweak rubies garbage collection via environment variables (depending on your ruby version):

RUBY_GC_MALLOC_LIMIT=4000100
RUBY_GC_MALLOC_LIMIT_MAX=16000100
RUBY_GC_MALLOC_LIMIT_GROWTH_FACTOR=1.1

Or call garbage collection manualy via GC.start

For your example, try

docker run -ti -m 209715200 ruby:2.1 ruby -e 'while true do array = []; 3000000.times do array << "hey" end; puts array.length; array = nil; end;'

to help the garbage collector.

Edit:

I don't have a comparable environment to yours. On my machine (14.04.5 LTS, docker 1.12.3, RAM 4GB, Intel(R) Core(TM) i5-3337U CPU @ 1.80GHz) the following looks quite promising.

docker run -ti -m 500MB  -e "RUBY_GC_MALLOC_LIMIT_GROWTH_FACTOR=1" \
                         -e "RUBY_GC_MALLOC_LIMIT=5242880" \
                         -e "RUBY_GC_MALLOC_LIMIT_MAX=16000100" \
                         -e "RUBY_GC_HEAP_INIT_SLOTS=500000" \                             
 ruby:2.1 ruby -e 'while true do array = []; 3000000.times do array << "hey" end; puts array.length; puts `ps -o rss -p #{Process::pid}`.chomp.split("\n").last.strip.to_i / 1024.0 / 1024 ; puts GC.stat; end;'

But every ruby app needs a different setup for fine tuning and if you experience memory leaks, your lost.

like image 109
slowjack2k Avatar answered Oct 17 '22 11:10

slowjack2k


I don't think this is a docker issue. You're overusing the resources of the container and Ruby tends to not behave well once you hit memory thresholds. It can GC, but if another process tries to take some memory or Ruby attempts to allocate again while you are maxed out then the kernel will (usually) kill the process with the most memory. If you're worried about memory usage on a server, add some threshold alerts at 80% RAM and allocate the appropriately sized resources for the job. When you start hitting thresholds, allocate more RAM or look at the particular job parameters/allocations to see if it needs to be redesigned to have a lower footprint.

Another potential option if you really want to have a nice fixed memory band to GC against is to use JRuby and set the JVM max memory to leave a little wiggle room on the container memory. The JVM will manage OOM within its own context better as it isn't sharing those resources with other processes nor letting the kernel think the server is dying.

like image 23
Pyrce Avatar answered Oct 17 '22 10:10

Pyrce


I had a similar issue with a few java based Docker containers that were running on a single Docker host. The problem was each container saw the total available memory of the host machine and assumed it could use all of that memory for itself. It didn't run GC very often and I ended up getting out of memory exceptions. I ended up manually limiting the amount of memory each container could use and I no longer got OOMs. Within the contianer I also limited the memory of the JVM.

Not sure if this is the same issue you're seeing but it could be related.

https://docs.docker.com/engine/reference/run/#/runtime-constraints-on-resources

like image 2
Fabian Avatar answered Oct 17 '22 09:10

Fabian