I'm banging my head against the wall with this issue. We're running many containers in parallel, they're running simple filesystem operations or simple linux commands and some of them under certain circumstances fail with memory allocation issues, the Docker container get OOMKiled.
I believe it is not related to the specific command. tail
is not the only command that fails, we have also encountered cp
or gzip
.
We have narrowed down the issue and created a script, that will fail almost certainly when the parameters are adjusted accordingly to the underlying system.
https://github.com/keboola/processor-oom-test
The script with the default settings generates a random CSV with 100M rows (~2.5GB), copies it 20 times and then runs 20 containers running tail -n +2 ...
. On a m5.2xlarge
AWS EC2 instance with 1TB SSD some of the containers are OOMKilled (and some end with different errors). The processes are terminated with various errors:
/code/tail.sh: line 2: 10 Killed tail -n +2 '/data/source.csv' > '/data/destination.csv'
tail: error reading '/data/source.csv': Cannot allocate memory
tail: write error
(the last one is not OOMKilled)
I am not aware that tail
should consume any memory at all. If the number of concurrently working containers is low enough, it can easily survive with 64MB of memory. With larger number of containers even 256MB is not enough mempry. I have been watching htop
and docker stats
and havent seen any spikes in memory consumption.
Things we have already tried
Some of that helped only partially. Adjusting memory limit or number of containers made it crash again every time. We had a container with 1GB memory running simple tail
on a large file crash with OOMKilled.
Further on what I have tried several months ago - https://500.keboola.com/cp-in-docker-cannot-allocate-memory-1a5f57113dc4. And the --memory-swap
turned out only to be only a partial help.
Any suggestions? I am no Linux expert, so I may be missing something important. Any help or advice is greatly appreciated.
Seems you have problem with "Write cache size".
When you want to write something to the disk it's not written directly, but it's stored in write cache (called dirty_pages). It's beacause all processes don't need to wait until they get privilage to write to the disk and they continue their work. But when the process did not get priviliage to write on the disk for a long time, it's writting buffer starts to grow until it reaches your memory limit defined for container. Then it gets killed by docker.
There's a daemon called pdflush
which takes care about flushing caches and write those dirty_pages on disk. I think you are definetly looking for parameters vm.dirty_bytes
and vm.dirty_background_bytes
. These two parameters are well described here Difference between vm.dirty_ratio and vm.dirty_background_ratio?.
For example if you are using memory limit --memory=128M
and your container runs exactly one process at the time, your vm.dirty_bytes
should not exceed 128M in bytes. vm.dirty_background_ratio
(there is option to set ratio [% of total memory] or exact number of bytes) depends on the number of containers your are running at the same time. This value is not so important for you and you can set it somewhere between 10 to 15.
To set those variable use sysctl -w
. For your case it should be:
sysctl -w vm.dirty_bytes=134217728
sysctl -w vm.dirty_background_ratio=15
Hope it will help!
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With