I've been given the task of squeezing more performance out of our software. Unfortunately, release builds are done using the debug settings, and attempts to argue for the case of optimisation have been unsuccessful so far.
Compiling for x86 with compiler flags /ZI /Od /arch:SSE2 /FAs
. The generated assembly shows that the compiler is not making use of SSE2
. Is this because optimisation is disabled?
In the code, there are a few situations similar to this:
char* begin = &bufferObject;
char* end = begin + sizeof(bufferObject);
char result;
while ( begin != end ) {
result ^= *begin++;
}
I'd like to have the compiler vectorise this operation for me, but it doesn't; I suspect optimisation needs to be enabled.
I hand-coded two solutions: one using an inline __asm
block, and the other using the SSE2 intrinsicts defined in <emmintrin.h>
. I'd prefer not to rely on this.
Further to the questions above, I would like calls to library functions, like memcpy
, to use the provided vectorised versions when appropriate. Looking at the assembly code for memcpy
, I can see that there is a function called _VEC_memcpy
which makes use of SSE2
for faster copying. The block which decides whether to branch to this routine or not is this:
; First, see if we can use a "fast" copy SSE2 routine
; block size greater than min threshold?
cmp ecx,080h
jb Dword_align
; SSE2 supported?
cmp DWORD PTR __sse2_available,0
je Dword_align
; alignments equal?
push edi
push esi
and edi,15
and esi,15
cmp edi,esi
pop esi
pop edi
jne Dword_align
; do fast SSE2 copy, params already set
jmp _VEC_memcpy
I don't think that _VEC_memcpy
is being called... ever.
/arch:SSE2
flag be defining this __sse2_available
symbol?Visual Studio 2010 and earlier has no support for automatic vectorization at all.
The purpose of /arch:SSE2
is to allow the compiler to use scalar SSE for floating-point operations instead of the x87 FPU.
So you may get some speedup with /arch:SSE2
since it allows you to access more registers on x64. But keep it mind that it is not from vectorization.
If you want vectorization on VS2010, you pretty much have to do it manually with intrinsics.
Visual Studio 2012 has support for auto-vectorization:
http://msdn.microsoft.com/en-us/library/hh872235%28v=vs.110%29.aspx
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With