I'm copying elements from one array to another in C++. I found the rep movs
instruction in x86 that seems to copy an array at ESI to an array at EDI of size ECX. However, neither the for
nor while
loops I tried compiled to a rep movs
instruction in VS 2008 (on an Intel Xeon x64 processor). How can I write code that will get compiled to this instruction?
Honestly, you shouldn't. REP is sort of an obsolete holdover in the instruction set, and actually pretty slow since it has to call a microcoded subroutine inside the CPU, which has a ROM lookup latency and is nonpipelined as well.
In almost every implementation, you will find that the memcpy()
compiler intrinsic both is easier to use and runs faster.
Under MSVC there are the __movsxxx
& __stosxxx
intrinsics that will generate a REP
prefixed instruction.
there is also a 'hack' to force intrinsic memset
aka REP STOS
under vc9+, as the intrinsic no longer exits, due to the sse2 branching in the crt. this is better that __stosxxx
due to the fact the compiler can optimize it for constants and order it correctly.
#define memset(mem,fill,size) memset((DWORD*)mem,((fill) << 24|(fill) << 16|(fill) << 8|(fill)),size)
__forceinline void memset(DWORD* pStart, unsigned long dwFill, size_t nSize)
{
//credits to Nepharius for finding this
DWORD* pLast = pStart + (nSize >> 2);
while(pStart < pLast)
*pStart++ = dwFill;
if((nSize &= 3) == 0)
return;
if(nSize == 3)
{
(((WORD*)pStart))[0] = WORD(dwFill);
(((BYTE*)pStart))[2] = BYTE(dwFill);
}
else if(nSize == 2)
(((WORD*)pStart))[0] = WORD(dwFill);
else
(((BYTE*)pStart))[0] = BYTE(dwFill);
}
of course REP
isn't always the best thing to use, imo your way better off using memcpy
, it'll branch to either sse2 or REPS MOV
based on your system (under msvc), unless you feeling like writing custom assembly for 'hot' areas...
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With