Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

access violation _mm_store_si128 SSE Intrinsics

I want to create a histogram of vertical gradients in an 8 bit gray image. The vertical distance to calculate the gradient can be specified. I already managed to speed up another part of my code using Intrinsics, but it does not work here. The code runs without exception if the _mm_store_si128 is commented out. When it is not commented, I get an access violation.

What is going wrong here?

#define _mm_absdiff_epu8(a,b) _mm_adds_epu8(_mm_subs_epu8(a, b), _mm_subs_epu8(b, a)) //from opencv
void CreateAbsDiffHistogramUnmanaged(void* source, unsigned int sourcestride, unsigned int height, unsigned int verticalDistance, unsigned int histogram[])
{
unsigned int xcount = sourcestride / 16;
__m128i absdiffData;
unsigned char* bytes = (unsigned char*) _aligned_malloc(16, 16);
__m128i* absdiffresult = (__m128i*) bytes;
__m128i* sourceM = (__m128i*) source;
__m128i* sourceVOffset = (__m128i*)source + verticalDistance * sourcestride;

for (unsigned int y = 0; y < (height - verticalDistance); y++)
{
    for (unsigned int x = 0; x < xcount; x++, ++sourceM, ++sourceVOffset)
    {
        absdiffData = _mm_absdiff_epu8(*sourceM, *sourceVOffset);
        _mm_store_si128(absdiffresult, absdiffData);
        //unroll loop
        histogram[bytes[0]]++;
        histogram[bytes[1]]++;
        histogram[bytes[2]]++;
        histogram[bytes[3]]++;
        histogram[bytes[4]]++;
        histogram[bytes[5]]++;
        histogram[bytes[6]]++;
        histogram[bytes[7]]++;
        histogram[bytes[8]]++;
        histogram[bytes[9]]++;
        histogram[bytes[10]]++;
        histogram[bytes[11]]++;
        histogram[bytes[12]]++;
        histogram[bytes[13]]++;
        histogram[bytes[14]]++;
        histogram[bytes[15]]++;
    }
}
_aligned_free(bytes);
}
like image 566
Josef Bauer Avatar asked Feb 15 '17 10:02

Josef Bauer


1 Answers

Your function crashed while loading because the input data was not aligned properly. In order to solve this problem you have to change your code from:

absdiffData = _mm_absdiff_epu8(*sourceM, *sourceVOffset);

to:

absdiffData = _mm_absdiff_epu8(_mm_loadu_si128(sourceM), _mm_loadu_si128(sourceVOffset));

Here I use unaligned loading.

P.S. I have implemented a similar function (SimdAbsSecondDerivativeHistogram) in Simd Library. It has SSE2, AVX2, NEON and Altivec implementations. I hope that it will help you.

P.P.S. Also I would strongly recommended to check this line:

__m128i* sourceVOffset = (__m128i*)source + verticalDistance * sourcestride);

It may result in a crash (access to memory outside of the input array bounds). Maybe you had in mind this:

__m128i* sourceVOffset = (__m128i*)((char*)source + verticalDistance * sourcestride);
like image 61
ErmIg Avatar answered Sep 23 '22 00:09

ErmIg