I am copying N bytes from pSrc
to pDest
. This can be done in a single loop:
for (int i = 0; i < N; i++) *pDest++ = *pSrc++
Why is this slower than memcpy
or memmove
? What tricks do they use to speed it up?
memcpy is only faster if: BOTH buffers, src AND dst, are 4-byte aligned. if so, memcpy() can copy a 32bit word at a time (inside its own loop over the length) if just one buffer is NOT 32bit word aligned - it creates overhead to figure out and it will do at the end a single char copy loop.
It copies 100 mb with memcpy, and then moves about 100 mb with memmove; source and destination are overlapping. Various "distances" for source and destination are tried. Each test is run 10 times, the average time is printed. Memmove is implemented as a SSE optimized assembler code, copying from back to front.
Answer: memcpy() function is is used to copy a specified number of bytes from one memory to another. memmove() function is used to copy a specified number of bytes from one memory to another or to overlap on same memory.
If you know the length of a string, you can use mem functions instead of str functions. For example, memcpy is faster than strcpy because it does not have to search for the end of the string. If you are certain that the source and target do not overlap, use memcpy instead of memmove .
Because memcpy uses word pointers instead of byte pointers, also the memcpy implementations are often written with SIMD instructions which makes it possible to shuffle 128 bits at a time.
SIMD instructions are assembly instructions that can perform the same operation on each element in a vector up to 16 bytes long. That includes load and store instructions.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With