Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

SSE intrinsics cause normal float operation to return -1.#INV

I am having a problem with a SSE method I am writing that performs audio processing. I have implemented a SSE random function based on Intel's paper here:

http://software.intel.com/en-us/articles/fast-random-number-generator-on-the-intel-pentiumr-4-processor/

I also have a method that is performing conversions from Float to S16 using SSE also, the conversion is performed quite simply as follows:

unsigned int Float_S16LE(float *data, const unsigned int samples, uint8_t *dest)
{
  int16_t *dst = (int16_t*)dest;
  const __m128 mul = _mm_set_ps1((float)INT16_MAX);
   __m128 rand;
  const uint32_t even = count & ~0x3;
  for(uint32_t i = 0; i < even; i += 4, data += 4, dst += 4)
  {
    /* random round to dither */
    FloatRand4(-0.5f, 0.5f, NULL, &rand);

    __m128 rmul = _mm_add_ps(mul, rand);
    __m128 in = _mm_mul_ps(_mm_load_ps(data),rmul);
    __m64 con = _mm_cvtps_pi16(in);

    memcpy(dst, &con, sizeof(int16_t) * 4);
  }
}

FloatRand4 is defined as follows:

static inline void FloatRand4(const float min, const float max, float result[4], __m128 *sseresult = NULL)
{
  const float delta  = (max - min) / 2.0f;
  const float factor = delta / (float)INT32_MAX;
  ...
}

If sseresult != NULL the __m128 result is returned and result is unused. This performs perfectly on the first loop, but on the next loop delta becomes -1.#INF instead of 1.0. If I comment out the line __m64 con = _mm_cvtps_pi16(in); the problem goes away.

I think that the FPU is getting into an unknown state or something.

like image 269
Geoffrey Avatar asked Jan 29 '12 10:01

Geoffrey


2 Answers

Mixing SSE Integer arithmetic and (regular) Floating point math. Can produce weird results because both are operating on the same registers. If you use:

_mm_empty()

the FPU is reset into a correct state. Microsoft has Guidelines for When to Use EMMS

like image 110
Moe Avatar answered Sep 23 '22 07:09

Moe


  • _mm_load_ps is not guaranteed to do an aligned load. float* data can be aligned to 4 bytes instead of 16 _ => _mm_loadu_ps
  • memcpy will probably kill the advantages achieved with SSE, you should use a store command for __m64 but here again, take care of the alignment. If it's impossible to do an unaligned stream or store of an __m64, I'd either keep it inside an _m128i and do a masked write with _mm_maskmoveu_si128 or store those 8 bytes by hand.

http://msdn.microsoft.com/en-us/library/bytwczae.aspx

like image 20
Sam Avatar answered Sep 22 '22 07:09

Sam