Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

memcpy where size is known at compile time

I find myself tuning a piece of code where memory is copied using memcpy and the third parameter (size) is known at compile time.

The consumer of the function calling memcpy does something similar to this:

template <size_t S>
void foo() {
    void* dstMemory = whateverA
    void* srcMemory = whateverB
    memcpy(dstMemory, srcMemory, S) 
}

Now, I would have expected that the memcpy intrinsic was smart enough to realise that this:

foo<4>()

... Can replace the memcpy in the function with a 32 bit integer assignment. However, I surprisingly find myself seeing a >2x speedup doing this:

template<size_t size>
inline void memcpy_fixed(void* dst, const void* src) {
    memcpy(dst, src, size);
}


template<>
inline void memcpy_fixed<4>(void* dst, const void* src) { *((uint32_t*)dst) =  *((uint32_t*)src); }

And rewriting foo to:

 template <size_t S>
 void foo() {
    void* dstMemory = whateverA
    void* srcMemory = whateverB
    memcpy_fixed<S>(dstMemory, srcMemory) 
}

Both tests are on clang (OS X) with -O3. I really would have expected the memcpy intrinsic to be smarter about the case where the size is known at compile time.

My compiler flags are:

-gline-tables-only -O3 -fno-omit-frame-pointer -mno-omit-leaf-frame-pointer

Am I asking too much of the c++ compiler or is there some compiler flag I am missing?

like image 493
Thomas Kejser Avatar asked Oct 22 '25 16:10

Thomas Kejser


2 Answers

memcpy is not the same as *((uint32_t*)dst) = *((uint32_t*)src).

memcpy can deal with unaligned memory.

By the way, most modern compiler do replace memcpy of known size with suitable code emission. for small size it usually emit things like rep movsb, which may not be fastest by good enough in most case.

If you found your particular case you gain 2x speed and you think you need to speed it up, you are free to get your hand dirty (with clear comments).

like image 110
Non-maskable Interrupt Avatar answered Oct 25 '25 06:10

Non-maskable Interrupt


If both source and destination buffers are provided as function parameters:

template <size_t S>
void foo(char* dst, const char* src) {
    memcpy(dst, src, S);
}

then clang++ 3.5.0 uses memcpy only when S is big but it uses the movl instruction when S = 4.

However, your source and destination addresses are not parameters of this function and this seems to prevent the compiler from making this aggressive optimization.

like image 45
dlask Avatar answered Oct 25 '25 06:10

dlask



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!