I tried to create my memcpy code with rep movsb
instruction. It works perfectly with any size when the optimization is disabled. But, when I enable optimization, it does not work as expected.
I read about enhanced movsb for memcpy from Intel® 64 and IA-32 Architectures Optimization Reference Manual section 3.7.6. I came to the libc source code and I saw default memcpy from libc uses SSE instead of movsb
.
Hence, I want to compare the performance between SSE instruction and rep movsb for memcpy. But now, I find something wrong with it.
#include <stdio.h>
#include <string.h>
inline static void *my_memcpy(
register void *dest,
register const void *src,
register size_t n
) {
__asm__ volatile(
"mov %0, %%rdi;"
"mov %1, %%rsi;"
"mov %2, %%rcx;"
"rep movsb;"
:
: "r"(dest), "r"(src), "r"(n)
: "rdi", "rsi", "rcx"
);
return dest;
}
#define to_boolean_str(A) ((A) ? "true" : "false")
int main()
{
char src[32];
char dst[32];
memset(src, 'a', 32);
memset(dst, 'b', 32);
my_memcpy(dst, src, 1);
printf("%s\n", to_boolean_str(!memcmp(dst, src, 1)));
my_memcpy(dst, src, 2);
printf("%s\n", to_boolean_str(!memcmp(dst, src, 2)));
my_memcpy(dst, src, 3);
printf("%s\n", to_boolean_str(!memcmp(dst, src, 3)));
return 0;
}
ammarfaizi2@integral:~$ gcc --version
gcc (Ubuntu 9.3.0-10ubuntu2) 9.3.0
Copyright (C) 2019 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
ammarfaizi2@integral:~$ gcc -O0 test.c -o test && ./test
true
true
true
ammarfaizi2@integral:~$ gcc -O1 test.c -o test && ./test
false
true
true
ammarfaizi2@integral:~$ gcc -O2 test.c -o test && ./test
false
true
true
ammarfaizi2@integral:~$ gcc -O3 test.c -o test && ./test
false
true
true
ammarfaizi2@integral:~$
my_memcpy(dst, src, 1);
results in wrong behavior if optimizations are enabled.
As written, your asm constraints do not reflect that the asm statement can modify memory, so the compiler can freely reorder it with respect to operations that read or write the memory at dest
or src
. You need to add "memory"
to the clobber list.
As others have noted, you should also edit the constraints to avoid mov
. If you do so, you'll need to also represent in the constraints the fact that the asm now modifies its arguments (e.g. make them all dual input/output) and backup the value of dest
so you can return it. So you might skip this improvement until you've gotten it working to begin with and until you understand how constraints work.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With