I am copying N bytes from <code>pSrc</code> to <code>pDest</code>. This can be done in a single loop: <pre class="prettyprint"><code>for (int i = 0; i < N; i++) *pDest++ = *pSrc++ </code></pre> Why is this slower than <code>memcpy</code> or <code>memmove</code>? What tricks do they use to speed it up?

Because memcpy uses word pointers instead of byte pointers, also the memcpy implementations are often written with SIMD instructions which makes it possible to shuffle 128 bits at a time. SIMD instructions are assembly instructions that can perform the same operation on each element in a vector up to 16 bytes long. That includes load and store instructions.

Why are memcpy() and memmove() faster than pointer increments?

Tags:

c++

c

loops

I am copying N bytes from pSrc to pDest. This can be done in a single loop:

for (int i = 0; i < N; i++)     *pDest++ = *pSrc++

Why is this slower than memcpy or memmove? What tricks do they use to speed it up?

463

asked Oct 15 '11 05:10

wanderer

1 Answers

Because memcpy uses word pointers instead of byte pointers, also the memcpy implementations are often written with SIMD instructions which makes it possible to shuffle 128 bits at a time.

SIMD instructions are assembly instructions that can perform the same operation on each element in a vector up to 16 bytes long. That includes load and store instructions.

answered Sep 24 '22 12:09

onemasse

Related questions
                            
                                How to set up unit testing for Visual Studio C++
                            
                                When do extra parentheses have an effect, other than on operator precedence?
                            
                                Copy constructor and = operator overload in C++: is a common function possible?
                            
                                explicit specialization of template class member function
                            
                                "\n" or '\n' or std::endl to std::cout? [duplicate]
                            
                                Any reason not to use global lambdas?
                            
                                How to simulate "Press any key to continue?"
                            
                                Pretty-print std::tuple
                            
                                How can I use cout << myclass
                            
                                How to Generate a calling graph for C++ code
                            
                                How to declare a function that accepts a lambda?
                            
                                conversion from derived * to base * exists but is inaccessible
                            
                                What's the result of += in C and C++?
                            
                                Multi line preprocessor macros
                            
                                Function with same name but different signature in derived class
                            
                                Set QLineEdit to accept only numbers
                            
                                When do function-level static variables get allocated/initialized?
                            
                                What does OpenCV's cvWaitKey( ) function do?
                            
                                Why does a lambda have a size of 1 byte?
                            
                                How to use std::sort to sort an array in C++

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With