"Kubernetes" (v1.10.2) says that my pod (which contains one container) is using about 5GB memory. Inside the container, RSS is saying more like 681MiB. Can anypony explain how to get from 681MiB to 5GB with the following data (or describe how to make up the difference with another command I've omitted, either from the container or from the docker host that is running this container in kubernetes)?
kubectl top pods says 5GB:
% kubectl top pods -l app=myapp NAME CPU(cores) MEMORY(bytes) myapp-56b947bf6d-2lcr7 39m 5039Mi
Cadvisor reports a similar number (might have been from a slightly different time, so please ignore small differences):
container_memory_usage_bytes{pod_name=~".*myapp.*"} 5309456384 5309456384 / 1024.0 / 1024 ~= 5063 ~= 5039
Inside the container, this file appears to be where cadvisor is getting its data:
% kubectl exec -it myapp-56b947bf6d-2lcr7 bash meme@myapp-56b947bf6d-2lcr7:/app# cat /sys/fs/cgroup/memory/memory.usage_in_bytes 5309456384
The resident set size (RSS) inside the container does NOT match up (less than 1GB):
meme@myapp-56b947bf6d-2lcr7:/app# kb=$(ps aux | grep -v grep | grep -v 'ps aux' | grep -v bash | grep -v awk | grep -v RSS | awk '{print $6}' | awk '{s+=$1} END {printf "%.0f", s}'); mb=$(expr $kb / 1024); printf "Kb: $kb\nMb: $mb\n" Kb: 698076 Mb: 681
Full ps aux in case that is helpful:
meme@myapp-56b947bf6d-2lcr7:/app# ps aux | grep -v grep | grep -v 'ps aux' | grep -v bash | grep -v awk USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND meme 1 0.0 0.0 151840 10984 ? Ss Jun04 0:29 /usr/sbin/apache2 -D FOREGROUND www-data 10 0.0 0.0 147340 4652 ? S Jun04 0:00 /usr/sbin/apache2 -D FOREGROUND www-data 11 0.0 0.0 148556 4392 ? S Jun04 0:16 /usr/sbin/apache2 -D FOREGROUND www-data 12 0.2 0.0 2080632 11348 ? Sl Jun04 31:58 /usr/sbin/apache2 -D FOREGROUND www-data 13 0.1 0.0 2080384 10980 ? Sl Jun04 18:12 /usr/sbin/apache2 -D FOREGROUND www-data 68 0.3 0.0 349048 94272 ? Sl Jun04 47:09 hotapp www-data 176 0.2 0.0 349624 92888 ? Sl Jun04 43:11 hotapp www-data 179 0.2 0.0 349196 94456 ? Sl Jun04 42:20 hotapp www-data 180 0.3 0.0 349828 95112 ? Sl Jun04 44:14 hotapp www-data 185 0.3 0.0 346644 91948 ? Sl Jun04 43:49 hotapp www-data 186 0.3 0.0 346208 91568 ? Sl Jun04 44:27 hotapp www-data 189 0.2 0.0 350208 95476 ? Sl Jun04 41:47 hotapp
Memory section from docker's container stats API:
curl --unix-socket /var/run/docker.sock 'http:/v1.24/containers/a45fc651e7b12f527b677e6a46e2902786bee6620484922016a135e317a42b4e/stats?stream=false' | jq . # yields: "memory_stats": { "usage": 5327712256, "max_usage": 5368344576, "stats": { "active_anon": 609095680, "active_file": 74457088, "cache": 109944832, "dirty": 28672, "hierarchical_memory_limit": 5368709120, "inactive_anon": 1687552, "inactive_file": 29974528, "mapped_file": 1675264, "pgfault": 295316278, "pgmajfault": 77, "pgpgin": 85138921, "pgpgout": 84964308, "rss": 605270016, "rss_huge": 0, "shmem": 5513216, "total_active_anon": 609095680, "total_active_file": 74457088, "total_cache": 109944832, "total_dirty": 28672, "total_inactive_anon": 1687552, "total_inactive_file": 29974528, "total_mapped_file": 1675264, "total_pgfault": 295316278, "total_pgmajfault": 77, "total_pgpgin": 85138921, "total_pgpgout": 84964308, "total_rss": 605270016, "total_rss_huge": 0, "total_shmem": 5513216, "total_unevictable": 0, "total_writeback": 0, "unevictable": 0, "writeback": 0 }, "limit": 5368709120 },
A comment on https://github.com/google/cadvisor/issues/638 asserts:
Total (memory.usage_in_bytes) = rss + cache
https://www.kernel.org/doc/Documentation/cgroup-v1/memory.txt says:
usage_in_bytes: For efficiency, as other kernel components, memory cgroup uses some optimization to avoid unnecessary cacheline false sharing. usage_in_bytes is affected by the method and doesn't show 'exact' value of memory (and swap) usage, it's a fuzz value for efficient access. (Of course, when necessary, it's synchronized.) If you want to know more exact memory usage, you should use RSS+CACHE(+SWAP) value in memory.stat(see 5.2).
https://docs.docker.com/engine/reference/commandline/stats/#parent-command says:
Note: On Linux, the Docker CLI reports memory usage by subtracting page cache usage from the total memory usage. The API does not perform such a calculation but rather provides the total memory usage and the amount from the page cache so that clients can use the data as needed.
And indeed, most of the stuff in /sys/fs/cgroup/memory/memory.stat in the container shows up in the above docker stats api response (slight differences are from taking the samples at a different time, sorry):
meme@myapp-56b947bf6d-2lcr7:/app# cat /sys/fs/cgroup/memory/memory.stat cache 119492608 rss 607436800 rss_huge 0 shmem 5525504 mapped_file 1675264 dirty 69632 writeback 0 pgpgin 85573974 pgpgout 85396501 pgfault 296366011 pgmajfault 80 inactive_anon 1687552 active_anon 611213312 inactive_file 32800768 active_file 81166336 unevictable 0 hierarchical_memory_limit 5368709120 total_cache 119492608 total_rss 607436800 total_rss_huge 0 total_shmem 5525504 total_mapped_file 1675264 total_dirty 69632 total_writeback 0 total_pgpgin 85573974 total_pgpgout 85396501 total_pgfault 296366011 total_pgmajfault 80 total_inactive_anon 1687552 total_active_anon 611213312 total_inactive_file 32800768 total_active_file 81166336 total_unevictable 0
Memory info from kubectl describe pod <pod>
:
Limits: memory: 5Gi Requests: memory: 4Gi
Here's what pmap
says inside the container. In this one-liner, I get all process ids, run pmap -x on them, and pull the Kbytes column from the pmap results. The total result is 256 Megabytes (much less than ps's RSS, partially, I think, because many of the processes return no output from pmap -x):
ps aux | awk '{print $2}' | grep -v PID | xargs sudo pmap -x | grep total | grep -v grep | awk '{print $3}' | awk '{s+=$1} END {printf "%.0f", s}'; echo 256820
ps_mem.py is mentioned at https://stackoverflow.com/a/133444/6090676. It inspects /proc/$pid/statm
and /proc/$pid/smaps
. No illumination here (again, it seems to be ignoring some processes):
# python ps_mem.py Private + Shared = RAM used Program 1.7 MiB + 1.0 MiB = 2.7 MiB apache2 2.0 MiB + 1.0 MiB = 3.0 MiB bash (3) --------------------------------- 5.7 MiB =================================
There is another question similar to this (but with less information) at Incorrect reporting of container memory usage by cadvisor. Thanks!
usage_in_bytes: For efficiency, as other kernel components, memory cgroup uses some optimization to avoid unnecessary cacheline false sharing. usage_in_bytes is affected by the method and doesn't show 'exact' value of memory (and swap) usage, it's a fuzz value for efficient access.
The --memory parameter limits the container memory usage, and Docker will kill the container if the container tries to use more than the limited memory.
The memory subsystem of the cgroups feature isolates the memory behavior of a group of processes (tasks) from the rest of the system. It reports on memory resources used by the processes in a cgroup, and sets limits on memory used by those processes.
Docker does not apply memory limitations to containers by default. The Host's Kernel Scheduler determines the capacity provided to the Docker memory. This means that in theory, it is possible for a Docker container to consume the entire host's memory.
One thing I didn't see you check here is kernel memory. This is also accounted for in the memory.usage_in_bytes
figure, but doesn't appear in memory.stat
. You can find that by looking at /sys/fs/cgroup/memory/memory.kmem.usage_in_bytes
.
I saw a similar thing happening for one of our .NET core applications, once, and I couldn't figure out what exactly was happening (perhaps a memory leak in .NET core since it's unmanaged memory our app doesn't control).
Perhaps it's another breadcrumb for you. It would depend on your application whether that use was normal or not, but in terms of cgroups I believe kernel memory use is unconstrained by default.
I don't know if you already find your answer or not but let me give you some information that may help.
cAdvisor extract many memory-related metrics. We will focus on:
container_memory_usage_bytes
= value in /sys/fs/cgroup/memory/memory.usage_in_bytes file. (Usage of the memory)
container_memory_working_set_bytes
= container_memory_usage_bytes
- total_inactive_file
(from /sys/fs/cgroup/memory/memory.stat), this is calculated in cAdvisor and is <= container_memory_usage_bytes
container_memory_rss
= total_rss
value from /sys/fs/cgroup/memory/memory.stat
Now you know how those metrics are gathered, you need to know that when you use the kubectl top pods
command, you get the value of container_memory_working_set_bytes
not container_memory_usage_bytes
metric.
so from your values:
5039Mi "working set fro kubectl command" ~= 5064 "from memory.usage file" - 28 "total_inactive_file from Memory section from docker's container stats API"
It is also worth to mention that when the value of container_memory_usage_bytes
reaches to the limits, your pod will NOT get oom-killed. BUT if container_memory_working_set_bytes
or container_memory_rss
reached to the limits, the pod will be killed.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With