I strace'd a java process that was triggering a lot of kernel time to see what syscalls were being used, and was surprised to see that gettimeofday() and clock_gettime() dominated (I suspect it's due to logging), which is strange considering that man vdso states:
When tracing systems calls with strace(1), symbols (system calls) that are exported by the vDSO will not appear in the trace output.
How come these system calls are happening? Is there a way to avoid them?
The machine is running Ubuntu 16.04.1 on EC2.
To make things easier, I created a minimal test program in C (testgtod.c):
#include <stdlib.h>
#include <sys/time.h>
void main(void)
{
    struct timeval tv;
    for(int i = 0; i < 1000; i++) {
        /* glibc wrapped, shouldn't actually syscall */
        gettimeofday(&tv, NULL);
    }
}
I then compiled and ran the program under strace: gcc testgtod.c -o testgtod && sudo strace ./testgtod
The output included a thousand calls to gettimeofday(), despite my expectation.
Things I tested to make sure I'm not seeing things:
Made sure the binary is a 64-bit elf using file
ldd ./testgtod to make sure vDSO is active:
linux-vdso.so.1 => (0x00007ffcee25d000) libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f6f6e161000) /lib64/ld-linux-x86-64.so.2 (0x0000559ed71f3000)
getauxval(AT_SYSINFO_EHDR) != NULL
Replaced gettimeofday(&tv, NULL) calls with syscall(SYS_gettimeofday, &tv, NULL), increased number of calls to 10 million, ran under time - runtime behavior was the same in both cases: ./testgtod  0.16s user 0.83s system 99% cpu 0.998 total.
The issue is related to this being a VM running on Xen, specifically, the Xen clocksource does not yet allow for vDSO access to the clock:
ubuntu@machine:~% cat /sys/devices/system/clocksource/*/current_clocksource
xen
Then, I changed the clocksource to tsc:
ubuntu@machine:~% sudo sh -c "echo tsc >/sys/devices/system/clocksource/clocksource0/current_clocksource"
NOTE: it isn't recommended to move to the tsc clocksource on production machines since it may cause backwards drift for the clock.
See https://blog.packagecloud.io/eng/2017/03/08/system-calls-are-much-slower-on-ec2/ for a detailed write-up on the interaction between vDSO and the clocksource.
NOTE 2: it seems tsc support in Xen has improved with version 4.0 and with improved CPU support in Sandy Bridge+ platforms. Modern EC2 machines should be okay with tsc. Check Xen version using dmesg | grep "Xen version". Amazon recommended the tsc clocksource already in re:Invent 2015 (https://www.slideshare.net/AmazonWebServices/cmp402-amazon-ec2-instances-deep-dive). I'm not yet running to production with this, but the situation doesn't seem as bad as implied by packagecloud.
Additional reading:
Why rdtsc interacts poorly with VMs
Xen's 4.0 rdtsc changes
Linux kernel timekeeping documentation, discussing the pitfalls of the TSC
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With