Modern x86_64 linux with glibc will detect that CPU has support of AVX extension and will switch many string functions from generic implementation to AVX-optimized version (with help of ifunc dispatchers: 1, 2).
This feature can be good for performance, but it prevents several tool like valgrind (older libVEXs, before valgrind-3.8) and gdb's "target record
" (Reverse Execution) from working correctly (Ubuntu "Z" 17.04 beta, gdb 7.12.50.20170207-0ubuntu2, gcc 6.3.0-8ubuntu1 20170221, Ubuntu GLIBC 2.24-7ubuntu2):
$ cat a.c
#include <string.h>
#define N 1000
int main(){
char src[N], dst[N];
memcpy(dst, src, N);
return 0;
}
$ gcc a.c -o a -fno-builtin
$ gdb -q ./a
Reading symbols from ./a...(no debugging symbols found)...done.
(gdb) start
Temporary breakpoint 1 at 0x724
Starting program: /home/user/src/a
Temporary breakpoint 1, 0x0000555555554724 in main ()
(gdb) record
(gdb) c
Continuing.
Process record does not support instruction 0xc5 at address 0x7ffff7b60d31.
Process record: failed to record execution log.
Program stopped.
__memmove_avx_unaligned_erms () at ../sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S:416
416 VMOVU (%rsi), %VEC(4)
(gdb) x/i $pc
=> 0x7ffff7b60d31 <__memmove_avx_unaligned_erms+529>: vmovdqu (%rsi),%ymm4
There is error message "Process record does not support instruction 0xc5
" from gdb's implementation of "target record", because AVX instructions are not supported by the record/replay engine (sometimes the problem is detected on _dl_runtime_resolve_avx
function): https://sourceware.org/ml/gdb/2016-08/msg00028.html "some AVX instructions are not supported by process record", https://bugs.launchpad.net/ubuntu/+source/gdb/+bug/1573786, https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=836802, https://bugzilla.redhat.com/show_bug.cgi?id=1136403
Solution proposed in https://sourceware.org/ml/gdb/2016-08/msg00028.html "You can recompile libc (thus ld.so), or hack __init_cpu_features and thus __cpu_features at runtime (see e.g. strcmp)." or set LD_BIND_NOW=1
, but recompiled glibc still has AVX, and ld bind-now doesn't help.
I heard that there are /etc/ld.so.nohwcap
and LD_HWCAP_MASK
configurations in glibc. Can they be used to disable ifunc dispatching to AVX-optimized string functions in glibc?
How does glibc (rtld?) detects AVX, using cpuid
, with /proc/cpuinfo
(probably not), or HWCAP aux (LD_SHOW_AUXV=1 /bin/echo |grep HWCAP
command gives AT_HWCAP: bfebfbff
)?
It looks like there is a nice workaround for this implemented in recent versions of glibc: a "tunables" feature that guides selection of optimized string functions. You can find a general overview of this feature here and the relevant code inside glibc in ifunc-impl-list.c.
Here's how I figured it out. First, I took the address being complained about by gdb:
Process record does not support instruction 0xc5 at address 0x7ffff75c65d4.
I then looked it up in the table of shared libraries:
(gdb) info shared
From To Syms Read Shared Object Library
0x00007ffff7fd3090 0x00007ffff7ff3130 Yes /lib64/ld-linux-x86-64.so.2
0x00007ffff76366b0 0x00007ffff766b52e Yes /usr/lib/x86_64-linux-gnu/libubsan.so.1
0x00007ffff746a320 0x00007ffff75d9cab Yes /lib/x86_64-linux-gnu/libc.so.6
...
You can see that this address is within glibc. But what function, specifically?
(gdb) disassemble 0x7ffff75c65d4
Dump of assembler code for function __strcmp_avx2:
0x00007ffff75c65d0 <+0>: mov %edi,%eax
0x00007ffff75c65d2 <+2>: xor %edx,%edx
=> 0x00007ffff75c65d4 <+4>: vpxor %ymm7,%ymm7,%ymm7
I can look in ifunc-impl-list.c to find the code that controls selecting the avx2 version:
IFUNC_IMPL (i, name, strcmp,
IFUNC_IMPL_ADD (array, i, strcmp,
HAS_ARCH_FEATURE (AVX2_Usable),
__strcmp_avx2)
IFUNC_IMPL_ADD (array, i, strcmp, HAS_CPU_FEATURE (SSE4_2),
__strcmp_sse42)
IFUNC_IMPL_ADD (array, i, strcmp, HAS_CPU_FEATURE (SSSE3),
__strcmp_ssse3)
IFUNC_IMPL_ADD (array, i, strcmp, 1, __strcmp_sse2_unaligned)
IFUNC_IMPL_ADD (array, i, strcmp, 1, __strcmp_sse2))
It looks like AVX2_Usable
is the feature to disable. Let's rerun gdb accordingly:
GLIBC_TUNABLES=glibc.cpu.hwcaps=-AVX2_Usable gdb...
On this iteration it complained about __memmove_avx_unaligned_erms
, which appeared to be enabled by AVX_Usable
- but I found another path in ifunc-memmove.h enabled by AVX_Fast_Unaligned_Load
. Back to the drawing board:
GLIBC_TUNABLES=glibc.cpu.hwcaps=-AVX2_Usable,-AVX_Fast_Unaligned_Load gdb ...
On this final round I discovered a rdtscp
instruction in the ASAN shared library, so I recompiled without the address sanitizer and at last, it worked.
In summary: with some work it's possible to disable these instructions from the command line and use gdb's record feature without severe hacks.
There does not seem a straightforward runtime method to patch feature detection. This detection happens rather early in the dynamic linker (ld.so).
Binary patching the linker seems the easiest method at the moment. @osgx described one method where a jump is overwritten. Another approach is just to fake the cpuid result. Normally cpuid(eax=0)
returns the highest supported function in eax
while the manufacturer IDs are returned in registers ebx, ecx and edx. We have this snippet in glibc 2.25 sysdeps/x86/cpu-features.c
:
__cpuid (0, cpu_features->max_cpuid, ebx, ecx, edx);
/* This spells out "GenuineIntel". */
if (ebx == 0x756e6547 && ecx == 0x6c65746e && edx == 0x49656e69)
{
/* feature detection for various Intel CPUs */
}
/* another case for AMD */
else
{
kind = arch_kind_other;
get_common_indeces (cpu_features, NULL, NULL, NULL, NULL);
}
The __cpuid
line translates to these instructions in /lib/ld-linux-x86-64.so.2
(/lib/ld-2.25.so
):
172a8: 31 c0 xor eax,eax
172aa: c7 44 24 38 00 00 00 mov DWORD PTR [rsp+0x38],0x0
172b1: 00
172b2: c7 44 24 3c 00 00 00 mov DWORD PTR [rsp+0x3c],0x0
172b9: 00
172ba: 0f a2 cpuid
So rather than patching branches, we could as well change the cpuid
into a nop
instruction which would result in invocation of the last else
branch (as the registers will not contain "GenuineIntel"). Since initially eax=0
, cpu_features->max_cpuid
will also be 0 and the if (cpu_features->max_cpuid >= 7)
will also be bypassed.
Binary patching cpuid(eax=0)
by nop
this can be done with this utility (works for both x86 and x86-64):
#!/usr/bin/env python
import re
import sys
infile, outfile = sys.argv[1:]
d = open(infile, 'rb').read()
# Match CPUID(eax=0), "xor eax,eax" followed closely by "cpuid"
o = re.sub(b'(\x31\xc0.{0,32}?)\x0f\xa2', b'\\1\x66\x90', d)
assert d != o
open(outfile, 'wb').write(o)
An equivalent Perl variant, -0777
ensures that the file is read at once instead of separating records at line feeds:
perl -0777 -pe 's/\x31\xc0.{0,32}?\K\x0f\xa2/\x66\x90/' < /lib/ld-linux-x86-64.so.2 > ld-linux-x86-64-patched.so.2
# Verify result, should display "Success"
cmp -s /lib/ld-linux-x86-64.so.2 ld-linux-x86-64-patched.so.2 && echo 'Not patched' || echo Success
That was the easy part. Now, I did not want to replace the system-wide dynamic linker, but execute only one particular program with this linker. Sure, that can be done with ./ld-linux-x86-64-patched.so.2 ./a
, but the naive gdb invocations failed to set breakpoints:
$ gdb -q -ex "set exec-wrapper ./ld-linux-x86-64-patched.so.2" -ex start ./a
Reading symbols from ./a...done.
Temporary breakpoint 1 at 0x400502: file a.c, line 5.
Starting program: /tmp/a
During startup program exited normally.
(gdb) quit
$ gdb -q -ex start --args ./ld-linux-x86-64-patched.so.2 ./a
Reading symbols from ./ld-linux-x86-64-patched.so.2...(no debugging symbols found)...done.
Function "main" not defined.
Temporary breakpoint 1 (main) pending.
Starting program: /tmp/ld-linux-x86-64-patched.so.2 ./a
[Inferior 1 (process 27418) exited normally]
(gdb) quit
A manual workaround is described in How to debug program with custom elf interpreter? It works, but it is unfortunately a manual action using add-symbol-file
. It should be possible to automate it a bit using GDB Catchpoints though.
An alternative approach that does not binary linking is LD_PRELOAD
ing a library that defines custom routines for memcpy
, memove
, etc. This will then take precedence over the glibc routines. The full list of functions is available in sysdeps/x86_64/multiarch/ifunc-impl-list.c
. Current HEAD has more symbols compared to the glibc 2.25 release, in total (grep -Po 'IFUNC_IMPL \(i, name, \K[^,]+' sysdeps/x86_64/multiarch/ifunc-impl-list.c
):
memchr, memcmp, __memmove_chk, memmove, memrchr, __memset_chk, memset, rawmemchr, strlen, strnlen, stpncpy, stpcpy, strcasecmp, strcasecmp_l, strcat, strchr, strchrnul, strrchr, strcmp, strcpy, strcspn, strncasecmp, strncasecmp_l, strncat, strncpy, strpbrk, strspn, strstr, wcschr, wcsrchr, wcscpy, wcslen, wcsnlen, wmemchr, wmemcmp, wmemset, __memcpy_chk, memcpy, __mempcpy_chk, mempcpy, strncmp, __wmemset_chk,
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With