Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I stop GCC from optimizing this byte-for-byte copy into a memcpy call?

I have this code for memcpy as part of my implementation of the standard C library which copies memory from src to dest one byte at a time:

void *memcpy(void *restrict dest, const void *restrict src, size_t len)
{
    char *dp = (char *restrict)dest;
    const char *sp = (const char *restrict)src;

    while( len-- )
    {
        *dp++ = *sp++;
    }

    return dest;
}

With gcc -O2, the code generated is reasonable:

memcpy:
.LFB0:
        movq    %rdi, %rax
        testq   %rdx, %rdx
        je      .L2
        xorl    %ecx, %ecx
.L3:
        movzbl  (%rsi,%rcx), %r8d
        movb    %r8b, (%rax,%rcx)
        addq    $1, %rcx
        cmpq    %rdx, %rcx
        jne     .L3
.L2:
        ret
.LFE0:

However, at gcc -O3, GCC optimizes this naive byte-for-byte copy into a memcpy call:

memcpy:
.LFB0:
        testq   %rdx, %rdx
        je      .L7
        subq    $8, %rsp
        call    memcpy
        addq    $8, %rsp
        ret
.L7:
        movq    %rdi, %rax
        ret
.LFE0:

This won't work (memcpy unconditionally calls itself), and it causes a segfault.

I've tried passing -fno-builtin-memcpy and -fno-loop-optimizations, and the same thing occurs.

I'm using GCC version 8.3.0:

Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/usr/local/libexec/gcc/x86_64-cros-linux-gnu/8.3.0/lto-wrapper
Target: x86_64-cros-linux-gnu
Configured with: ../configure --prefix=/usr/local --libdir=/usr/local/lib64 --build=x86_64-cros-linux-gnu --host=x86_64-cros-linux-gnu --target=x86_64-cros-linux-gnu --enable-checking=release --disable-multilib --enable-threads=posix --disable-bootstrap --disable-werror --disable-libmpx --enable-static --enable-shared --program-suffix=-8.3.0 --with-arch-64=x86-64
Thread model: posix
gcc version 8.3.0 (GCC) 

How do I disable the optimization that causes the copy to be transformed into a memcpy call?

like image 657
S.S. Anne Avatar asked Aug 17 '19 19:08

S.S. Anne


People also ask

What is the best way to optimize memcpy for a cache?

With a cold cache, optimized memcpy with write-back cache works best because the cache doesn't have to write to memory and so avoids any delays on the bus. For a garbage-filled cache, write-through caches work slightly better, because the cache doesn't need to spend extra cycles evicting irrelevant data to memory.

What is the difference between memcpy() and memmove() in C++?

The memcpy () function copies count bytes of src to dest . The behavior is undefined if copying takes place between objects that overlap. The memmove () function allows copying between objects that might overlap.

How to prevent compiler optimization in GCC and ICC?

Some version of gcc and icc tends to leave inline assembly touched variable intact. This becomes the second technique to prevent optimizations. For example, Facebook’s Folly library uses the following doNotOptimizeAway function to prevent optimizing an expression:

Is there a precompiled version of memcpy?

Cross-compiler vendors generally include a precompiled set of standard class libraries, including a basic implementation of memcpy () . Unfortunately, since this same code must run on hardware with a variety of processors and memory architectures, it can't be optimized for any specific architecture.


2 Answers

One thing that seems to be sufficient here: instead of using -fno-builtin-memcpy use -fno-builtin for compiling the translation unit of memcpy alone!

An alternative would be to pass -fno-tree-loop-distribute-patterns; though this might be brittle as it forbids the compiler from reorganizing the loop code first and then replacing part of them with calls to mem* functions.

Or, since you cannot rely anything in the C library, perhaps using -ffreestanding could be in order.

like image 121

This won't work (memcpy unconditionally calls itself), and it causes a segfault.

Redefining memcpy is undefined behavior.

How do I disable the optimization that causes the copy to be transformed into a memcpy call (preferably while still compiling with -O3)?

Don't. The best approach is fixing your code instead:

  • In most cases, you should use another name.

  • In the rare case you are really implementing a C library (as discussed in the comments), and you really want to reimplement memcpy, then you should be using compiler-specific options to achieve that. For GCC, see -fno-builtin* and -ffreestanding, as well as -nodefaultlibs and -nostdlib.

like image 40
Acorn Avatar answered Oct 20 '22 17:10

Acorn