I have this code for <code>memcpy</code> as part of my implementation of the standard C library which copies memory from <code>src</code> to <code>dest</code> one byte at a time: <pre class="prettyprint"><code>void *memcpy(void *restrict dest, const void *restrict src, size_t len) { char *dp = (char *restrict)dest; const char *sp = (const char *restrict)src; while( len-- ) { *dp++ = *sp++; } return dest; } </code></pre> With <code>gcc -O2</code>, the code generated is reasonable: <pre class="prettyprint"><code>memcpy: .LFB0: movq %rdi, %rax testq %rdx, %rdx je .L2 xorl %ecx, %ecx .L3: movzbl (%rsi,%rcx), %r8d movb %r8b, (%rax,%rcx) addq $1, %rcx cmpq %rdx, %rcx jne .L3 .L2: ret .LFE0: </code></pre> However, at <code>gcc -O3</code>, GCC optimizes this naive byte-for-byte copy into a <code>memcpy</code> call: <pre class="prettyprint"><code>memcpy: .LFB0: testq %rdx, %rdx je .L7 subq $8, %rsp call memcpy addq $8, %rsp ret .L7: movq %rdi, %rax ret .LFE0: </code></pre> This won't work (<code>memcpy</code> unconditionally calls itself), and it causes a segfault. I've tried passing <code>-fno-builtin-memcpy</code> and <code>-fno-loop-optimizations</code>, and the same thing occurs. I'm using GCC version 8.3.0: <pre class="prettyprint"><code>Using built-in specs. COLLECT_GCC=gcc COLLECT_LTO_WRAPPER=/usr/local/libexec/gcc/x86_64-cros-linux-gnu/8.3.0/lto-wrapper Target: x86_64-cros-linux-gnu Configured with: ../configure --prefix=/usr/local --libdir=/usr/local/lib64 --build=x86_64-cros-linux-gnu --host=x86_64-cros-linux-gnu --target=x86_64-cros-linux-gnu --enable-checking=release --disable-multilib --enable-threads=posix --disable-bootstrap --disable-werror --disable-libmpx --enable-static --enable-shared --program-suffix=-8.3.0 --with-arch-64=x86-64 Thread model: posix gcc version 8.3.0 (GCC) </code></pre> How do I disable the optimization that causes the copy to be transformed into a <code>memcpy</code> call?

One thing that seems to be sufficient here: instead of using <code>-fno-builtin-memcpy</code> use <code>-fno-builtin</code> for compiling the translation unit of <code>memcpy</code> alone! An alternative would be to pass <code>-fno-tree-loop-distribute-patterns</code>; though this might be brittle as it forbids the compiler from reorganizing the loop code first and then replacing part of them with calls to <code>mem*</code> functions. Or, since you cannot rely anything in the C library, perhaps using <code>-ffreestanding</code> could be in order.

<blockquote> This won't work (memcpy unconditionally calls itself), and it causes a segfault. </blockquote> Redefining <code>memcpy</code> is undefined behavior. <blockquote> How do I disable the optimization that causes the copy to be transformed into a memcpy call (preferably while still compiling with -O3)? </blockquote> Don't. The best approach is fixing your code instead: <ul> <li>In most cases, you should use another name.</li> <li>In the rare case you are really implementing a C library (as discussed in the comments), and you really want to reimplement <code>memcpy</code>, then you should be using compiler-specific options to achieve that. For GCC, see <code>-fno-builtin*</code> and <code>-ffreestanding</code>, as well as <code>-nodefaultlibs</code> and <code>-nostdlib</code>.</li> </ul>

How do I stop GCC from optimizing this byte-for-byte copy into a memcpy call?

Tags:

c

compiler-optimization

gcc

I have this code for memcpy as part of my implementation of the standard C library which copies memory from src to dest one byte at a time:

void *memcpy(void *restrict dest, const void *restrict src, size_t len)
{
    char *dp = (char *restrict)dest;
    const char *sp = (const char *restrict)src;

    while( len-- )
    {
        *dp++ = *sp++;
    }

    return dest;
}

With gcc -O2, the code generated is reasonable:

memcpy:
.LFB0:
        movq    %rdi, %rax
        testq   %rdx, %rdx
        je      .L2
        xorl    %ecx, %ecx
.L3:
        movzbl  (%rsi,%rcx), %r8d
        movb    %r8b, (%rax,%rcx)
        addq    $1, %rcx
        cmpq    %rdx, %rcx
        jne     .L3
.L2:
        ret
.LFE0:

However, at gcc -O3, GCC optimizes this naive byte-for-byte copy into a memcpy call:

memcpy:
.LFB0:
        testq   %rdx, %rdx
        je      .L7
        subq    $8, %rsp
        call    memcpy
        addq    $8, %rsp
        ret
.L7:
        movq    %rdi, %rax
        ret
.LFE0:

This won't work (memcpy unconditionally calls itself), and it causes a segfault.

I've tried passing -fno-builtin-memcpy and -fno-loop-optimizations, and the same thing occurs.

I'm using GCC version 8.3.0:

Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/usr/local/libexec/gcc/x86_64-cros-linux-gnu/8.3.0/lto-wrapper
Target: x86_64-cros-linux-gnu
Configured with: ../configure --prefix=/usr/local --libdir=/usr/local/lib64 --build=x86_64-cros-linux-gnu --host=x86_64-cros-linux-gnu --target=x86_64-cros-linux-gnu --enable-checking=release --disable-multilib --enable-threads=posix --disable-bootstrap --disable-werror --disable-libmpx --enable-static --enable-shared --program-suffix=-8.3.0 --with-arch-64=x86-64
Thread model: posix
gcc version 8.3.0 (GCC)

How do I disable the optimization that causes the copy to be transformed into a memcpy call?

657

asked Aug 17 '19 19:08

S.S. Anne

2 Answers

One thing that seems to be sufficient here: instead of using -fno-builtin-memcpy use -fno-builtin for compiling the translation unit of memcpy alone!

An alternative would be to pass -fno-tree-loop-distribute-patterns; though this might be brittle as it forbids the compiler from reorganizing the loop code first and then replacing part of them with calls to mem* functions.

Or, since you cannot rely anything in the C library, perhaps using -ffreestanding could be in order.

121

answered Oct 20 '22 18:10

Antti Haapala -- Слава Україні

This won't work (memcpy unconditionally calls itself), and it causes a segfault.

Redefining memcpy is undefined behavior.

How do I disable the optimization that causes the copy to be transformed into a memcpy call (preferably while still compiling with -O3)?

Don't. The best approach is fixing your code instead:

In most cases, you should use another name.
In the rare case you are really implementing a C library (as discussed in the comments), and you really want to reimplement memcpy, then you should be using compiler-specific options to achieve that. For GCC, see -fno-builtin* and -ffreestanding, as well as -nodefaultlibs and -nostdlib.

answered Oct 20 '22 17:10

Acorn

Related questions
                            
                                Linux mremap without freeing the old mapping?
                            
                                Linux socket: How to make send() wait for recv()
                            
                                What is the best way to calculate number of padding bytes
                            
                                Endian representation of 64-bit values
                            
                                Linux fork function compared to Windows' CreateProcess - what gets copied?
                            
                                How to choose size of hash table?
                            
                                How to know last argument of va_list?
                            
                                Generating a compiler from lex and yacc grammar
                            
                                C Programming (Functions pointer casting)
                            
                                LZ4 library decompressed data upper bound size estimation
                            
                                &((struct name *)NULL -> b) in printf statement [duplicate]
                            
                                Set cc to gcc instead of clang on OSX Yosemite
                            
                                How to quickly replicate a 6-byte unsigned integer into a memory region?
                            
                                Why use mmap over fread?
                            
                                Check whether variable of unknown signedness is in interval
                            
                                How do I get rid of the unused parameter warning in C with gcc 4.8.4 [-Wunused-parameter] [duplicate]
                            
                                How do I use a preprocessor macro inside an include?
                            
                                Why use <stdbool.h> instead of _Bool?
                            
                                How does the compiler know the string literal (const char*) already exists in data memory?
                            
                                Turning array in main into a global, to later be altered by main?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With