I was looking the implementation of memcpy.c, I found a different memcpy code. I couldnt understand why do they do (((ADDRESS) s) | ((ADDRESS) d) | c) & (sizeof(UINT) - 1)
#if !defined(__MACHDEP_MEMFUNC)
#ifdef _MSC_VER
#pragma function(memcpy)
#undef __MEMFUNC_ARE_INLINED
#endif
#if !defined(__MEMFUNC_ARE_INLINED)
/* Copy C bytes from S to D.
* Only works if non-overlapping, or if D < S.
*/
EXTERN_C void * __cdecl memcpy(void *d, const void *s, size_t c)
{
if ((((ADDRESS) s) | ((ADDRESS) d) | c) & (sizeof(UINT) - 1)) {
BYTE *pS = (BYTE *) s;
BYTE *pD = (BYTE *) d;
BYTE *pE = (BYTE *) (((ADDRESS) s) + c);
while (pS != pE)
*(pD++) = *(pS++);
}
else {
UINT *pS = (UINT *) s;
UINT *pD = (UINT *) d;
UINT *pE = (UINT *) (BYTE *) (((ADDRESS) s) + c);
while (pS != pE)
*(pD++) = *(pS++);
}
return d;
}
#endif /* ! __MEMFUNC_ARE_INLINED */
#endif /* ! __MACHDEP_MEMFUNC */
The code is testing whether the addresses are aligned suitably for a UINT
. If so, the code copies using UINT
objects. If not, the code copies using BYTE
objects.
The test works by first performing a bitwise OR of the two addresses. Any bit that is on in either address will be on in the result. Then the test performs a bitwise AND with sizeof(UINT) - 1
. It is expected the the size of a UINT
is some power of two. Then the size minus one has all lower bits on. E.g., if the size is 4 or 8, then one less than that is, in binary 112 or 1112. If either address is not a multiple of the size of a UINT
, then it will have one of these bits on, and the test will indicate it. (Usually, the best alignment for an integer object is the same as its size. This is not necessarily true. A modern implementation of this code should use _Alignof(UINT) - 1
instead of the size.)
Copying with UINT
objects is faster, because, at the hardware level, one load or store instruction loads or stores all the bytes of a UINT
(likely four bytes). Processors will typically copy faster when using these instructions than when using four times as many single-byte load or store instructions.
This code is of course implementation dependent; it requires support from the C implementation that is not part of the base C standard, and it depends on specific features of the processor it executes on.
A more advanced memcpy
implementation could contain additional features, such as:
The code
((((ADDRESS) s) | ((ADDRESS) d) | c) & (sizeof(UINT) - 1))
Checks to see if either s
, d
, or c
are not aligned to the size of a UINT
.
For example, if s = 0x7ff30b14
, d = 0x7ffa81d8
, c = 256
, and sizeof(UINT) == 4
, then:
s = 0b1111111111100110000101100010100
d = 0b1111111111110101000000111011000
c = 0b0000000000000000000000100000000
s | d | c = 0b1111111111110111000101111011100
(s | d | c) & 3 = 0b00
So both pointers are aligned. It is easier to copy memory between pointers that are both aligned, and this does it with only one branch.
On many architectures, *(UINT *) ptr
is much faster if ptr
is correctly aligned to the width of a UINT
. On some architectures, *(UINT *) ptr
will actually crash if ptr
is not correctly aligned.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With