Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Linker performance related to swap space?

Tags:

c

linux

gcc

swap

ld

Sometimes it's handy to mock up something with a little C program that uses a big chunk of static memory. I noticed after changing to Fedora 15 the program took a long time to compile. We're talking 30s vs. 0.1s. Even more weird was that ld (the linker) was maxing out the CPU and slowly started eating all available memory. After some fiddling I managed to find a correlation between this new problem and the size of my swap file. Here's an example program for the purposes of this discussion:

#include <string.h> #include <stdlib.h> #include <stdio.h> #define M 1000000 #define GIANT_SIZE (200*M)  size_t g_arr[GIANT_SIZE];  int main( int argc, char **argv){        int i;     for(i = 0; i<10; i++){         printf("This should be zero: %d\n",g_arr[i]);     }     exit(1); } 

This program has a giant array which has a declared size of about 200*8MB = 1.6GB of static memory. Compiling this program takes an inordinate amount of time:

[me@bleh]$ time gcc HugeTest.c   real    0m12.954s user    0m6.995s sys 0m3.890s  [me@bleh]$ 

13s For a ~13 line C program!? That's not right. The key number is the size of the static memory space. As soon as it is larger than the total swap space, it starts to compile quickly again. For example, I have 5.3GB of swap space, so changing GIANT_SIZE to (1000*M) gives the following time:

[me@bleh]$ time gcc HugeTest.c   real    0m0.087s user    0m0.026s sys 0m0.027s 

Ah, that's more like it! To further convince myself (and yourself, if you're trying this at home) that swap space was indeed the magic number, I tried changing the available swap space to a truly massive 19GB and trying to compile the (1000*M) version again:

[me@bleh]$ ls -ali /extraswap  5986 -rw-r--r-- 1 root root 14680064000 Jul 26 15:01 /extraswap [me@bleh]$ sudo swapon /extraswap  [me@bleh]$ time gcc HugeTest.c   real    4m28.089s user    0m0.016s sys 0m0.010s 

It didn't even complete after 4.5 minutes!

Clearly the linker is doing something wrong here, but I don't know how to work around this other than rewriting the program or messing around with swap space. I'd love to know if there's a solution, or if I've stumbled upon some arcane bug.

By the way, the programs all compile and run correctly, independent of all the swap business.

For reference, here is some possibly relevant information:

[]$ ulimit -a  core file size          (blocks, -c) 0 data seg size           (kbytes, -d) unlimited scheduling priority             (-e) 0 file size               (blocks, -f) unlimited pending signals                 (-i) 27027 max locked memory       (kbytes, -l) 64 max memory size         (kbytes, -m) unlimited open files                      (-n) 1024 pipe size            (512 bytes, -p) 8 POSIX message queues     (bytes, -q) 819200 real-time priority              (-r) 0 stack size              (kbytes, -s) 8192 cpu time               (seconds, -t) unlimited max user processes              (-u) 1024 virtual memory          (kbytes, -v) unlimited file locks                      (-x) unlimited  []$ uname -r  2.6.40.6-0.fc15.x86_64  []$ ld --version  GNU ld version 2.21.51.0.6-6.fc15 20110118 Copyright 2011 Free Software Foundation, Inc. This program is free software; you may redistribute it under the terms of the GNU General Public License version 3 or (at your option) a later version. This program has absolutely no warranty.  []$ gcc --version  gcc (GCC) 4.6.1 20110908 (Red Hat 4.6.1-9) Copyright (C) 2011 Free Software Foundation, Inc. This is free software; see the source for copying conditions.  There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  []$ cat /proc/meminfo  MemTotal:        3478272 kB MemFree:         1749388 kB Buffers:           16680 kB Cached:           212028 kB SwapCached:       368056 kB Active:           489688 kB Inactive:         942820 kB Active(anon):     401340 kB Inactive(anon):   803436 kB Active(file):      88348 kB Inactive(file):   139384 kB Unevictable:          32 kB Mlocked:              32 kB SwapTotal:      19906552 kB SwapFree:       17505120 kB Dirty:               172 kB Writeback:             0 kB AnonPages:        914972 kB Mapped:            60916 kB Shmem:              1008 kB Slab:              55248 kB SReclaimable:      26720 kB SUnreclaim:        28528 kB KernelStack:        3608 kB PageTables:        63344 kB NFS_Unstable:          0 kB Bounce:                0 kB WritebackTmp:          0 kB CommitLimit:    21645688 kB Committed_AS:   11208980 kB VmallocTotal:   34359738367 kB VmallocUsed:      139336 kB VmallocChunk:   34359520516 kB HardwareCorrupted:     0 kB AnonHugePages:    151552 kB HugePages_Total:       0 HugePages_Free:        0 HugePages_Rsvd:        0 HugePages_Surp:        0 Hugepagesize:       2048 kB DirectMap4k:      730752 kB DirectMap2M:     2807808 kB 

TL;DR: When the (large) static memory of a c program is slightly less than the available swap space, the linker takes forever to link the program. However, it's quite snappy when the static space is slightly larger than the available swap space. What's up with that!?

like image 717
Rooke Avatar asked Nov 22 '11 20:11

Rooke


People also ask

Does swap increase performance?

The short answer is, No. There are performance benefits when swap space is enabled, even when you have more than enough ram. Update, also see Part 2: Linux Performance: Almost Always Add Swap (ZRAM). …so in this case, as in many, swap usage is not hurting Linux server performance.

Is swap slower than RAM?

Once the physical memory is used up, swap gets used. As the swap disk is much slower than RAM, the performance goes down, and thrashing occurs. At this point, even logins into the system might become impossible.

How do I reduce swap space utilization?

To clear the swap memory on your system, you simply need to cycle off the swap. This moves all data from swap memory back into RAM. It also means that you need to be sure you have the RAM to support this operation. An easy way to do this is to run 'free -m' to see what is being used in swap and in RAM.

What happens if swap space is full?

If your system is using swap a lot, it will affect performance of the system overall as traditional drives are much slower than RAM. You either need to configure and adjust some of your applications to use less resources, or add more RAM.


2 Answers

I am able to reproduce this on an Ubuntu 10.10 system (GNU ld (GNU Binutils for Ubuntu) 2.20.51-system.20100908), and I think I have your answer. First, some methodology.

After confirming this happens to me in a small VM (512MB ram, 2GB swap), from here I decided the easiest thing to do would be to strace gcc and see what exactly was going on when everything went to hell:

~# strace -f gcc swap.c 

It illuminated the following:

vfork()                                 = 3589 [pid  3589] execve("/usr/lib/gcc/x86_64-linux-gnu/4.4.5/collect2", ["/usr/lib/gcc/x86_64-linux-gnu/4."..., "--build-id", "--eh-frame-hdr", "-m", "elf_x86_64", "--hash-style=gnu", "-dynamic-linker", "/lib64/ld-linux-x86-64.so.2", "-o", "swap", "-z", "relro", "/usr/lib/gcc/x86_64-linux-gnu/4."..., "/usr/lib/gcc/x86_64-linux-gnu/4."..., "/usr/lib/gcc/x86_64-linux-gnu/4."..., "-L/usr/lib/gcc/x86_64-linux-gnu/"..., ...], [/* 26 vars */]) = 0  ...  [pid  3589] vfork()                     = 3590  ...  [pid  3590] execve("/usr/bin/ld", ["/usr/bin/ld", "--build-id", "--eh-frame-hdr", "-m", "elf_x86_64", "--hash-style=gnu", "-dynamic-linker", "/lib64/ld-linux-x86-64.so.2", "-o", "swap", "-z", "relro", "/usr/lib/gcc/x86_64-linux-gnu/4."..., "/usr/lib/gcc/x86_64-linux-gnu/4."..., "/usr/lib/gcc/x86_64-linux-gnu/4."..., "-L/usr/lib/gcc/x86_64-linux-gnu/"..., ...], [/* 27 vars */]) = 0       ...  [pid  3590] lseek(13, 4096, SEEK_SET)   = 4096 [pid  3590] read(13, ".\4@\0\0\0\0\0>\4@\0\0\0\0\0N\4@\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 4096) = 4096 [pid  3590] mmap(NULL, 1600004096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f1771931000 <system comes to screeching halt> 

It would appear that, as we might have suspected, it looks like ld is actually trying to anonymously mmap the entire static memory space of this array (or possibly the entire program, it's hard to tell since the rest of the program is so small, it might all fit in that extra 4096).

So that's all well and good, but why does it work when we exceed the available swap on the system? Let's turn swapoff and run strace -f again...

[pid  3618] lseek(13, 4096, SEEK_SET)   = 4096 [pid  3618] read(13, ".\4@\0\0\0\0\0>\4@\0\0\0\0\0N\4@\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 4096) = 4096 [pid  3618] mmap(NULL, 1600004096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = -1 ENOMEM (Cannot allocate memory) [pid  3618] brk(0x60638000)             = 0x1046000 [pid  3618] mmap(NULL, 1600135168, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = -1 ENOMEM (Cannot allocate memory) [pid  3618] mmap(NULL, 134217728, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_NORESERVE, -1, 0) = 0x7fd011864000  ... 

Unsurprisingly, ld seems to do the same thing it tried last time, to mmap the entire space. but the system is no longer able to do that, it fails! ld tries again, and it fails again, then ld does something unexpected... it moves on with less memory.

Weird, I guess we'd better have a look at the ld code then. Drat, it doesn't do an explicit mmap. This must be coming from inside of a plain old malloc. We'll have to build ld with some debug symbols to track this down. Unfortunately, when I built bin-utils 2.21.1 the problem went away. Perhap it's been fixed in newer versions of bin-utils?

like image 96
SoapBox Avatar answered Sep 21 '22 13:09

SoapBox


I don't observe this behavior (with Debian/Sid/AMD64 on a 8Gb desktop, gcc 4.6.2, binutils gold ld (GNU Binutils for Debian 2.22) 1.11). Here is the changed program (displaying its memory map with pmap).

#include <string.h> #include <stdlib.h> #include <stdio.h> #define M 1000000 #define GIANT_SIZE (2000*M) size_t g_arr[GIANT_SIZE]; int main( int argc, char **argv){      int i;   char cmd[80];   for(i = 0; i<10; i++){       printf("This should be zero: %d\n",g_arr[i*1000]);   }   sprintf (cmd, "pmap %d", (int)getpid());   system(cmd);   exit(0); } 

Here is its compilation:

% time gcc -v -O big.c -o big Using built-in specs. COLLECT_GCC=/usr/bin/gcc-4.6.real COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-linux-gnu/4.6/lto-wrapper Target: x86_64-linux-gnu Configured with: ../src/configure -v --with-pkgversion='Debian 4.6.2-4' --with-bugurl=file:///usr/share/doc/gcc-4.6/README.Bugs --enable-languages=c,c++,fortran,objc,obj-c++,go --prefix=/usr --program-suffix=-4.6 --enable-shared --enable-linker-build-id --with-system-zlib --libexecdir=/usr/lib --without-included-gettext --enable-threads=posix --with-gxx-include-dir=/usr/include/c++/4.6 --libdir=/usr/lib --enable-nls --enable-clocale=gnu --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-plugin --enable-objc-gc --with-arch-32=i586 --with-tune=generic --enable-checking=release --build=x86_64-linux-gnu --host=x86_64-linux-gnu --target=x86_64-linux-gnu Thread model: posix gcc version 4.6.2 (Debian 4.6.2-4)  COLLECT_GCC_OPTIONS='-v' '-O' '-o' 'big' '-mtune=generic' '-march=x86-64'  /usr/lib/gcc/x86_64-linux-gnu/4.6/cc1 -quiet -v -imultilib . -imultiarch x86_64-linux-gnu big.c -quiet -dumpbase big.c -mtune=generic -march=x86-64 -auxbase big -O -version -o /tmp/ccWThBP5.s GNU C (Debian 4.6.2-4) version 4.6.2 (x86_64-linux-gnu)     compiled by GNU C version 4.6.2, GMP version 5.0.2, MPFR version 3.1.0, MPC version 0.9 warning: MPFR header version 3.1.0 differs from library version 3.1.0-p3. GGC heuristics: --param ggc-min-expand=100 --param ggc-min-heapsize=131072 ignoring nonexistent directory "/usr/local/include/x86_64-linux-gnu" ignoring nonexistent directory "/usr/lib/gcc/x86_64-linux-gnu/4.6/../../../../x86_64-linux-gnu/include" #include "..." search starts here: #include <...> search starts here:  /usr/lib/gcc/x86_64-linux-gnu/4.6/include  /usr/local/include  /usr/lib/gcc/x86_64-linux-gnu/4.6/include-fixed  /usr/include/x86_64-linux-gnu  /usr/include End of search list. GNU C (Debian 4.6.2-4) version 4.6.2 (x86_64-linux-gnu)     compiled by GNU C version 4.6.2, GMP version 5.0.2, MPFR version 3.1.0, MPC version 0.9 warning: MPFR header version 3.1.0 differs from library version 3.1.0-p3. GGC heuristics: --param ggc-min-expand=100 --param ggc-min-heapsize=131072 Compiler executable checksum: 4b128876859f8f310615c7040fa3cb67 COLLECT_GCC_OPTIONS='-v' '-O' '-o' 'big' '-mtune=generic' '-march=x86-64'  as --64 -o /tmp/ccm7905b.o /tmp/ccWThBP5.s COMPILER_PATH=/usr/lib/gcc/x86_64-linux-gnu/4.6/:/usr/lib/gcc/x86_64-linux-gnu/4.6/:/usr/lib/gcc/x86_64-linux-gnu/:/usr/lib/gcc/x86_64-linux-gnu/4.6/:/usr/lib/gcc/x86_64-linux-gnu/ LIBRARY_PATH=/usr/lib/gcc/x86_64-linux-gnu/4.6/:/usr/lib/gcc/x86_64-linux-gnu/4.6/../../../x86_64-linux-gnu/:/usr/lib/gcc/x86_64-linux-gnu/4.6/../../../../lib/:/lib/x86_64-linux-gnu/:/lib/../lib/:/usr/lib/x86_64-linux-gnu/:/usr/lib/../lib/:/usr/lib/gcc/x86_64-linux-gnu/4.6/../../../:/lib/:/usr/lib/ COLLECT_GCC_OPTIONS='-v' '-O' '-o' 'big' '-mtune=generic' '-march=x86-64'  /usr/lib/gcc/x86_64-linux-gnu/4.6/collect2 --build-id --no-add-needed --eh-frame-hdr -m elf_x86_64 --hash-style=both -dynamic-linker /lib64/ld-linux-x86-64.so.2 -o big /usr/lib/gcc/x86_64-linux-gnu/4.6/../../../x86_64-linux-gnu/crt1.o /usr/lib/gcc/x86_64-linux-gnu/4.6/../../../x86_64-linux-gnu/crti.o /usr/lib/gcc/x86_64-linux-gnu/4.6/crtbegin.o -L/usr/lib/gcc/x86_64-linux-gnu/4.6 -L/usr/lib/gcc/x86_64-linux-gnu/4.6/../../../x86_64-linux-gnu -L/usr/lib/gcc/x86_64-linux-gnu/4.6/../../../../lib -L/lib/x86_64-linux-gnu -L/lib/../lib -L/usr/lib/x86_64-linux-gnu -L/usr/lib/../lib -L/usr/lib/gcc/x86_64-linux-gnu/4.6/../../.. /tmp/ccm7905b.o -lgcc --as-needed -lgcc_s --no-as-needed -lc -lgcc --as-needed -lgcc_s --no-as-needed /usr/lib/gcc/x86_64-linux-gnu/4.6/crtend.o /usr/lib/gcc/x86_64-linux-gnu/4.6/../../../x86_64-linux-gnu/crtn.o gcc -v -O big.c -o big  0.07s user 0.01s system 90% cpu 0.089 total 

and its execution:

  % time ./big  This should be zero: 0  This should be zero: 0  This should be zero: 0  This should be zero: 0  This should be zero: 0  This should be zero: 0  This should be zero: 0  This should be zero: 0  This should be zero: 0  This should be zero: 0  8835:   ./big  0000000000400000      4K r-x--  /home/basile/tmp/big  0000000000401000      4K rw---  /home/basile/tmp/big  0000000000402000 15625000K rw---    [ anon ]  00007f2d15a44000   1512K r-x--  /lib/x86_64-linux-gnu/libc-2.13.so  00007f2d15bbe000   2048K -----  /lib/x86_64-linux-gnu/libc-2.13.so  00007f2d15dbe000     16K r----  /lib/x86_64-linux-gnu/libc-2.13.so  00007f2d15dc2000      4K rw---  /lib/x86_64-linux-gnu/libc-2.13.so  00007f2d15dc3000     20K rw---    [ anon ]  00007f2d15dc8000    124K r-x--  /lib/x86_64-linux-gnu/ld-2.13.so  00007f2d15fb4000     12K rw---    [ anon ]  00007f2d15fe4000     12K rw---    [ anon ]  00007f2d15fe7000      4K r----  /lib/x86_64-linux-gnu/ld-2.13.so  00007f2d15fe8000      4K rw---  /lib/x86_64-linux-gnu/ld-2.13.so  00007f2d15fe9000      4K rw---    [ anon ]  00007ffff5b5b000    132K rw---    [ stack ]  00007ffff5bff000      4K r-x--    [ anon ]  ffffffffff600000      4K r-x--    [ anon ]   total         15628908K  ./big  0.00s user 0.00s system 0% cpu 0.004 total 

I believe that installing a recent GCC (e.g. a GCC 4.6) with a binutils Gold linker is significant for such programs.

I don't hear any swapping involved.

like image 22
Basile Starynkevitch Avatar answered Sep 20 '22 13:09

Basile Starynkevitch