I am porting a function from inline assembly to MASM in Visual Studio 2013 and am having trouble getting a return value out of it.
Here is the C caller and the assembly function prototype:
extern "C" void AbsMax(__m128d* samples, int len, __m128d* pResult);
__m128d AbsMax(__m128d* samples, int len)
{
__m128d absMax = { 0, 0 };
AbsMax(samples, len, &absMax);
return absMax;
}
And the assembly function:
.686 ;Target processor. Use instructions for Pentium class machines
.xmm
.model flat, c ;Use the flat memory model. Use C calling conventions
.code ;Indicates the start of a code segment.
AbsMax proc samples:PTR DWORD, len:DWORD, result:PTR XMMWORD
;; Load up registers. xmm0 is min, xmm1 is max. L is Ch0, H is Ch1.
mov ecx, [len]
shl ecx, 4
mov esi, [samples]
lea esi, [esi+ecx]
neg ecx
pxor xmm0, xmm0
pxor xmm1, xmm1
ALIGN 16
_loop:
movaps xmm2, [esi+ecx]
add ecx, 16
minpd xmm0, xmm2
maxpd xmm1, xmm2
jne _loop
;; Store larger of -min and max for each channel. xmm2 is -min.
pxor xmm2, xmm2
subpd xmm2, xmm0
maxpd xmm1, xmm2
movaps [result], xmm1 ; <=== access violation here
xor eax, eax
xor ebx, ebx
ret
AbsMax ENDP
END
As I understand the convention for MASM, return values are normally returned out through the EAX register. However, since I'm trying to return a 128-bit value I'm assuming an out parameter is the way to go. As you can see in the assembly listing, assigning the out parameter (movaps [result]
) is causing an access violation (Access violation reading location 0x00000000). I've validated the address of result in the debugger and it looks fine.
What am I doing wrong?
For educational purposes, I wrote up a version of your function that uses intrinsics:
#include <immintrin.h>
extern "C" void AbsMax(__m128d* samples, int len, __m128d* pResult)
{
__m128d min = _mm_setzero_pd();
__m128d max = _mm_setzero_pd();
while (len--)
{
min = _mm_min_pd(min, *samples);
max = _mm_max_pd(max, *samples);
++samples;
}
*pResult = _mm_max_pd(max, _mm_sub_pd(_mm_setzero_pd(), min));
}
Then I compiled using the VC++ x64 compiler using cl /c /O2 /FA absmax.cpp
to generate an assembly listing (edited to remove line comments):
; Listing generated by Microsoft (R) Optimizing Compiler Version 18.00.31101.0
include listing.inc
INCLUDELIB LIBCMT
INCLUDELIB OLDNAMES
PUBLIC AbsMax
_TEXT SEGMENT
samples$ = 8
len$ = 16
pResult$ = 24
AbsMax PROC ; COMDAT
xorps xmm3, xmm3
movaps xmm2, xmm3
movaps xmm1, xmm3
test edx, edx
je SHORT $LN6@AbsMax
npad 3
$LL2@AbsMax:
minpd xmm2, XMMWORD PTR [rcx]
maxpd xmm1, XMMWORD PTR [rcx]
lea rcx, QWORD PTR [rcx+16]
dec edx
jne SHORT $LL2@AbsMax
$LN6@AbsMax:
subpd xmm3, xmm2
maxpd xmm1, xmm3
movaps XMMWORD PTR [r8], xmm1
ret 0
AbsMax ENDP
_TEXT ENDS
END
Noting that x64 uses a __fastcall
convention by default, and shadows the parameters on the stack, I see that the out parameter is in fact being written indirectly through r8
, which is the third integer parameter for x64 code, per MSDN. I think if your assembly code adopts this parameter convention, it will work.
The shadowed stack space is not initialized with the actual parameter values; it's intended for callees, if they need a place to stash the values while using the registers. That's why you're getting a zero value dereference error in your code. There's a calling convention mismatch. The debugger knows about the calling convention, so it can show you the enregistered value for the parameter.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With