Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why does `change_protection` hog CPU while loading a large amount of data into RAM?

Tags:

linux

rust

We have built an in memory database, which eats about 100-150G RAM in a single Vec, which is populated like this:

let mut result = Vec::with_capacity(a_very_large_number);
while let Ok(n) = reader.read(&mut buffer) {
    result.push(...);
}

perf top shows that the time is mostly spent in this "change_protection" function:

Samples: 48K of event 'cpu-clock', Event count (approx.): 694742858
 62.45%  [kernel]              [k] change_protection
 18.18%  iron                  [.] database::Database::init::h63748
  7.45%  [kernel]              [k] vm_normal_page
  4.88%  libc-2.17.so          [.] __memcpy_ssse3_back
  0.92%  [kernel]              [k] copy_user_enhanced_fast_string
  0.52%  iron                  [.] memcpy@plt

The CPU usage of this function grows as more and more data is loaded into RAM:

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
12383 iron      20   0  137g  91g 1372 D 76.1 37.9  27:37.00 iron

The code is running on an r3.8xlarge AWS EC2 instance, and transparent hugepage is already disabled.

[~]$ cat /sys/kernel/mm/transparent_hugepage/defrag
always madvise [never]
[~]$ cat /sys/kernel/mm/transparent_hugepage/enabled
always madvise [never]

cpuinfo

processor   : 0
vendor_id   : GenuineIntel
cpu family  : 6
model       : 62
model name  : Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
stepping    : 4
microcode   : 0x428
cpu MHz     : 2500.070
cache size  : 25600 KB
physical id : 0
siblings    : 16
core id     : 0
cpu cores   : 8
apicid      : 0
initial apicid  : 0
fpu     : yes
fpu_exception   : yes
cpuid level : 13
wp      : yes
flags       : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx rdtscp lm constant_tsc rep_good nopl xtopology eagerfpu pni pclmulqdq ssse3 cx16 pcid sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm xsaveopt fsgsbase smep erms
bogomips    : 5000.14
clflush size    : 64
cache_alignment : 64
address sizes   : 46 bits physical, 48 bits virtual
power management:

kernel

3.14.35-28.38.amzn1.x86_64

the real question is why is there so much overhead in that function?

like image 332
Dapeng Avatar asked Oct 03 '15 07:10

Dapeng


1 Answers

This seems to be an OS issue, rather than an issue with this specific rust function.

Most OSes (including Linux) use demand paging. By default, Linux will not allocate physical pages for newly allocated memory. Instead it will allocate a single zero page with read-only permissions for all the allocated memory (i.e., all virtual memory pages will point to this single physical memory page).

If you attempt to write to the memory, a page fault will happen, a new page will be allocated, and it's permissions will be set accordingly.

I'm guessing that you are seeing this effect in your program. If you try to do the same thing a second time, it should be much faster. There are also ways to control this policy via sysctl: https://www.kernel.org/doc/Documentation/vm/overcommit-accounting.

Not sure why you disabled THP, but in this case large pages might help you since the protection change will happen once for every large page (2Mib) instead of once per normal page (4KiB).

like image 61
ynimous Avatar answered Oct 31 '22 17:10

ynimous