Below is the code which I got while searching for an optimized memcpy
implementation.
Here's the link
void *memcpy(void *dst, void const *src, size_t len) {
long *plDst = (long *)dst;
long const *plSrc = (long const *)src;
if (!(src & 0xFFFFFFFC) && !(dst & 0xFFFFFFFC)) {
while (len >= 4) {
*plDst++ = *plSrc++;
len -= 4;
}
}
char *pcDst = (char *)plDst;
char const *pcDst = (char const *)plSrc;
while (len--) {
*pcDst++ = *pcSrc++;
}
return (dst);
}
Can somebody explain the line below to me?
if (!(src & 0xFFFFFFFC) && !(dst & 0xFFFFFFFC))
Here they want to check if src
and dst
address are aligned to a 4 byte
boundary or not. Why are they using !
though, as it will make the condition false
every time?
Secondly, is there any scope for more optimization in the above code?
Implementation of memcpy is not a big deal, you need to typecast the given source and destination address to char* (1 byte). After the typecasting copy the data from the source to destination one by one until n (given length). Disclaimer: Below function only to understand the working of memcpy.
"memcpy is more efficient than memmove." In your case, you most probably are not doing the exact same thing while you run the two functions. In general, USE memmove only if you have to. USE it when there is a very reasonable chance that the source and destination regions are over-lapping.
memmove() is similar to memcpy() as it also copies data from a source to destination. memcpy() leads to problems when source and destination addresses overlap as memcpy() simply copies data one by one from one location to another. For example consider below program.
This article, while addressing an interesting subject, falls short of providing correct examples. The code posted is referred to as GNU's newlib source code. Both the GNU project and the newlib team will be surprised to learn of this unexpected convergence statement! the newlib is not a GNU project and most of its source code is not GPL licenced.
This optimized implementation of memcpy
is non portable, sub-optimal and in many aspects incorrect.
The test if (!(src & 0xFFFFFFFC) && !(dst & 0xFFFFFFFC))
attempts to detect if the src
and dst
addresses are aligned on long
boundaries. It is cumbersome and non portable for multiple reasons and downright incorrect as you noticed:
void *
to int
is ugly and implementation defined. The pointers should be cast as (uintptr_t)
for better portability.0xFFFFFFFC
assumes type long
is 4 bytes. This may be incorrect, and indeed type long
is 8-byte long on 64-bit linux and Mac systems.src & 0xFFFFFFFC
is not a check for alignment and is unlikely to be 0
, the intended test for alignment on 4-byte boundaries is src & 3
.Furthermore, the code fails to optimize the case where src
and dst
have the same alignment but are not aligned on long
boundaries.
Other possible improvements include unrolling the loops, using a switch for small values of len
, combining bytes read from src
into long
s to write to dst
once its is aligned on long
boundaries...
Here is an improved alternative:
#include <stdint.h>
void *memcpy(void *dst, void const *src, size_t len) {
unsigned char *pcDst = (unsigned char *)dst;
unsigned char const *pcSrc = (unsigned char const *)src;
if (len >= sizeof(long) * 2
&& ((uintptr_t)src & (sizeof(long) - 1)) == ((uintptr_t)dst & (sizeof(long) - 1))) {
while (((uintptr_t)pcSrc & (sizeof(long) - 1)) != 0) {
*pcDst++ = *pcSrc++;
len--;
}
long *plDst = (long *)pcDst;
long const *plSrc = (long const *)pcSrc;
/* manually unroll the loop */
while (len >= sizeof(long) * 4) {
plDst[0] = plSrc[0];
plDst[1] = plSrc[1];
plDst[2] = plSrc[2];
plDst[3] = plSrc[3];
plSrc += 4;
plDst += 4;
len -= sizeof(long) * 4;
}
while (len >= sizeof(long)) {
*plDst++ = *plSrc++;
len -= sizeof(long);
}
pcDst = (unsigned char *)plDst;
pcSrc = (unsigned char const *)plSrc;
}
while (len--) {
*pcDst++ = *pcSrc++;
}
return dst;
}
Note that the casts from void *
are unnecessary in C, but required in C++.
Here are some important points to keep in mind when trying to optimize code for speed:
memcpy
is usually optimized in assembly or implemented as a built-in by modern compilers.If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With