I thought that docker containers shared these properties with the host.  However, on one docker host, there are these ulimit settings:
ulimit -a
core file size          (blocks, -c) 0
data seg size           (kbytes, -d) unlimited
scheduling priority             (-e) 0
file size               (blocks, -f) unlimited
pending signals                 (-i) 63399
max locked memory       (kbytes, -l) 64
max memory size         (kbytes, -m) unlimited
open files                      (-n) 1024
pipe size            (512 bytes, -p) 8
POSIX message queues     (bytes, -q) 819200
real-time priority              (-r) 0
stack size              (kbytes, -s) 8192
cpu time               (seconds, -t) unlimited
max user processes              (-u) 63399
virtual memory          (kbytes, -v) unlimited
file locks                      (-x) unlimited
But within a container, one gets:
ulimit -a
-f: file size (blocks)             unlimited
-t: cpu time (seconds)             unlimited
-d: data seg size (kb)             unlimited
-s: stack size (kb)                8192
-c: core file size (blocks)        unlimited
-m: resident set size (kb)         unlimited
-l: locked memory (kb)             64
-p: processes                      unlimited
-n: file descriptors               65536
-v: address space (kb)             unlimited
-w: locks                          unlimited
-e: scheduling priority            0
-r: real-time priority             0
Looking at the -n setting specifically - is the Container limited to 1024 open files because the host is so limited?  Can anyone please explain the differences between the meaning of ulimit from within the container and those from the underlying docker host?
Resource limits may be set by Docker during the container startup, and you may tune these settings using the --ulimit argument when launching the container. It may be easily verified by straceing the containerd process during the container startup, for example, the following command
$ docker run -it --ulimit nofile=1024 alpine
will produce the following trace:
prlimit64(7246, RLIMIT_NOFILE, {rlim_cur=1024, rlim_max=1024},  <unfinished ...>
and checking ulimit in the container gives the expected limit value:
-n: file descriptors               1024
When running the container without explicitly specified --ulimit, this check gives different value (probably inherited from containerd), e.g.:
-n: file descriptors               1048576
Why Docker is allowed to set limits higher that the ones you observe by checking ulimit on your host? Let's open man 2 prlimit:
A privileged process (under Linux: one with the CAP_SYS_RESOURCE capability
in the initial user namespace) may make arbitrary changes to either limit value.
This means that any process with the CAP_SYS_RESOURCE capability may set any resource limit, and Docker has this capability. You may check it by inspecting the CapEff field of /proc/$PID/status file, where $PID is a PID of containerd process, and decoding this value using capsh --decode:
$ pidof docker-containerd
675
$ cat /proc/675/status | grep CapEff
CapEff: 0000003fffffffff
$ capsh --decode=0000003fffffffff
0x0000003fffffffff=cap_chown,<...>,cap_sys_resource,<...>
To summarize: yes, Docker may increase resource limits for the containers, because it has privileges to do so, and you may tune these limits using the --ulimit argument.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With