Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Server becomes unresponsive periodically, OOM Killer inactive?

I'm hosting a Ruby application in a docker container on AWS. Unfortunately this Ruby application is known to leak memory so eventually it consumes all of the available memory.

I'm, perhaps naively, expecting OOM killer to get invoked and kill the Ruby process but nothing happens. Eventually the machine becomes unresponsive (web server doesn't respond, ssh is disabled). We force restart of the machine from the AWS console and get the following in the message the logs, so it is indeed alive at the time of the restart:

Apr 30 23:07:14 ip-10-0-10-24 init: serial (ttyS0) main process (2947) killed by TERM signal

I dont believe that this is resource exhaustion (ie running out of credits) in AWS. If I restart the application periodically the server never goes down.

  • I'm not disabling OOM Killer or changing any of the default docker memory config.
  • I'm running a stock Amazon Linux AMI release 2017.03 kernel.
  • This behavior is happening across multiple virtual instances in AWS

I'm very much at a loss here; why would memory pressure be causing machines to lock up?

like image 642
EightyEight Avatar asked May 01 '18 16:05

EightyEight


Video Answer


1 Answers

Apparently the solution I provided didn't seem to help the person who asked the question, but it might help someone else who stumbleupon here. The following are the 2 things I suggested which might be causing the problem.

Suggestions 1

I am guessing you are using the offical ruby docker image and when you run the container ruby is running as PID 1 inside the container.

If ruby is running as PID 1 then OOM killer wont be able to kill it, causing all the problem you are seeing.

To solve this problem you will have to make sure a proper init process runs as PID 1.

Docker 1.25 and above has the --init option for docker run command. This option will make sure that a proper init handles the tasks of PID 1, it will also pass all SIGNALs to your ruby application.

https://docs.docker.com/engine/reference/commandline/run/

--init API 1.25+ Run an init inside the container that forwards signals and reaps processes

The following is what docker uses as the init https://github.com/krallin/tini

Suggestion 2

There is a known issue with Amazon Linux AMI the details can be found at the following link https://github.com/aws/amazon-ecs-agent/issues/794. As of writing I am not sure if the problem with AMI was fixed or not.

So try a different AMI as suggested in that thread say the Ubuntu AMI.

like image 147
Josnidhin Avatar answered Nov 03 '22 09:11

Josnidhin