I am new to the SSE instructions and I was trying to learn them from this site: http://www.codeproject.com/Articles/4522/Introduction-to-SSE-Programming
I am using the GCC compiler on Ubuntu 10.10 with an Intel Core i7 960 CPU
Here is a code based on the article which I attempted:
For two arrays of length ARRAY_SIZE it calculates
fResult[i] = sqrt( fSource1[i]*fSource1[i] + fSource2[i]*fSource2[i] ) + 0.5
Here is the code
#include <iostream>
#include <iomanip>
#include <ctime>
#include <stdlib.h>
#include <xmmintrin.h> // Contain the SSE compiler intrinsics
#include <malloc.h>
void myssefunction(
float* pArray1, // [in] first source array
float* pArray2, // [in] second source array
float* pResult, // [out] result array
int nSize) // [in] size of all arrays
{
int nLoop = nSize/ 4;
__m128 m1, m2, m3, m4;
__m128* pSrc1 = (__m128*) pArray1;
__m128* pSrc2 = (__m128*) pArray2;
__m128* pDest = (__m128*) pResult;
__m128 m0_5 = _mm_set_ps1(0.5f); // m0_5[0, 1, 2, 3] = 0.5
for ( int i = 0; i < nLoop; i++ )
{
m1 = _mm_mul_ps(*pSrc1, *pSrc1); // m1 = *pSrc1 * *pSrc1
m2 = _mm_mul_ps(*pSrc2, *pSrc2); // m2 = *pSrc2 * *pSrc2
m3 = _mm_add_ps(m1, m2); // m3 = m1 + m2
m4 = _mm_sqrt_ps(m3); // m4 = sqrt(m3)
*pDest = _mm_add_ps(m4, m0_5); // *pDest = m4 + 0.5
pSrc1++;
pSrc2++;
pDest++;
}
}
int main(int argc, char *argv[])
{
int ARRAY_SIZE = atoi(argv[1]);
float* m_fArray1 = (float*) _aligned_malloc(ARRAY_SIZE * sizeof(float), 16);
float* m_fArray2 = (float*) _aligned_malloc(ARRAY_SIZE * sizeof(float), 16);
float* m_fArray3 = (float*) _aligned_malloc(ARRAY_SIZE * sizeof(float), 16);
for (int i = 0; i < ARRAY_SIZE; ++i)
{
m_fArray1[i] = ((float)rand())/RAND_MAX;
m_fArray2[i] = ((float)rand())/RAND_MAX;
}
myssefunction(m_fArray1 , m_fArray2 , m_fArray3, ARRAY_SIZE);
_aligned_free(m_fArray1);
_aligned_free(m_fArray2);
_aligned_free(m_fArray3);
return 0;
}
I get the following compiltation error
[Programming/SSE]$ g++ -g -Wall -msse sseintro.cpp
sseintro.cpp: In function ‘int main(int, char**)’:
sseintro.cpp:41: error: ‘_aligned_malloc’ was not declared in this scope
sseintro.cpp:53: error: ‘_aligned_free’ was not declared in this scope
[Programming/SSE]$
Where am I messing up? Am I missing some header files? I seem to have included all the relevant ones.
_aligned_malloc and _aligned_free are Microsoft-isms. Use posix_memalign or memalign on Linux et al. For Mac OS X you can just use malloc, as it is always 16 byte aligned. For portable SSE code you generally want to implement wrapper functions for aligned memory allocations, e.g.
void * malloc_simd(const size_t size)
{
#if defined WIN32 // WIN32
return _aligned_malloc(size, 16);
#elif defined __linux__ // Linux
return memalign(16, size);
#elif defined __MACH__ // Mac OS X
return malloc(size);
#else // other (use valloc for page-aligned memory)
return valloc(size);
#endif
}
Implementation of free_simd
is left as an exercise for the reader.
Short answer: use _mm_malloc
and _mm_free
from xmmintrin.h
instead of _aligned_malloc
and _aligned_free
.
You should not use _aligned_malloc
, _aligned_free
, posix_memalign
, memalign
, or whatever else when you are writing SSE/AVX code. These are all compiler/platform-specific functions (either MSVC or GCC or POSIX).
Intel introduced functions _mm_malloc
and _mm_free
in Intel compiler specifically for SIMD computations (see this). The other compilers with x86 target architecture added them too (just as they add Intel intrinsics regularly). In this sense they are the only cross-platform solution: they should be available in every compiler supporting SSE.
These functions are declared in xmmintrin.h
header. Any header for later SSE/AVX version automatically includes previous ones, so it would be enough to include only smmintrin.h
or emmintrin.h
for example.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With