Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Efficient floating point comparison (Cortex-A8)

There is a big (~100 000) array of floating point variables, and there is a threshold (also floating point).

The problem is that I have to compare each one variable from the array with a threshold, but NEON flags transfer takes a really long time (~20 cycles in accordance to a profiler).

Is there any efficient way to compare these values?

NOTE: As rounding error doesn't matter, I tried the following:

float arr[10000];
float threshold; 
....

int a = arr[20]; // e.g.
int t = threshold;
if (t > a) {....}

But in this case I getting the following processor command sequence:

vldr.32        s0, [r0]
vcvt.s32.f32   s0, s0
vmov           r0, s0    <--- takes 20 cycles as `vmrs APSR_nzcv, fpscr` in case of 
cmp            r0, r1         floating point comparison

As conversion happens at NEON, there is no matter if I compare integers, by described way or floats.

like image 689
Alex Avatar asked Apr 30 '12 10:04

Alex


People also ask

What is a better way to compare floating point values?

To compare two floating point values, we have to consider the precision in to the comparison. For example, if two numbers are 3.1428 and 3.1415, then they are same up to the precision 0.01, but after that, like 0.001 they are not same.

What is softfp?

softfp. The softfp option is a hybrid between hard and soft . The compiler is allowed to generate hardware floating-point instructions, but it still uses the soft-float ABI. Like with soft , the compiler generates functions to pass floating-point arguments to integer registers.

What is hard float?

Hard float just means the floating point calculations are done by on chip hardware rather than being emulated by software.


3 Answers

If floats are 32-bit IEEE-754 and ints are 32-bit too and if there are no +infinity, -infinity and NaN values, we can compare floats as ints with a little trick:

#include <stdio.h>
#include <limits.h>
#include <assert.h>

#define C_ASSERT(expr) extern char CAssertExtern[(expr)?1:-1]
C_ASSERT(sizeof(int) == sizeof(float));
C_ASSERT(sizeof(int) * CHAR_BIT == 32);

int isGreater(float* f1, float* f2)
{
  int i1, i2, t1, t2;

  i1 = *(int*)f1;
  i2 = *(int*)f2;

  t1 = i1 >> 31;
  i1 = (i1 ^ t1) + (t1 & 0x80000001);

  t2 = i2 >> 31;
  i2 = (i2 ^ t2) + (t2 & 0x80000001);

  return i1 > i2;
}

int main(void)
{
  float arr[9] = { -3, -2, -1.5, -1, 0, 1, 1.5, 2, 3 };
  float thr;
  int i;

  // Make sure floats are 32-bit IEE754 and
  // reinterpreted as integers as we want/expect
  {
    static const float testf = 8873283.0f;
    unsigned testi = *(unsigned*)&testf;
    assert(testi == 0x4B076543);
  }

  thr = -1.5;
  for (i = 0; i < 9; i++)
  {
    printf("%f %s %f\n", arr[i], "<=\0> " + 3*isGreater(&arr[i], &thr), thr);
  }

  thr = 1.5;
  for (i = 0; i < 9; i++)
  {
    printf("%f %s %f\n", arr[i], "<=\0> " + 3*isGreater(&arr[i], &thr), thr);
  }

  return 0;
}

Output:

-3.000000 <= -1.500000
-2.000000 <= -1.500000
-1.500000 <= -1.500000
-1.000000 >  -1.500000
0.000000 >  -1.500000
1.000000 >  -1.500000
1.500000 >  -1.500000
2.000000 >  -1.500000
3.000000 >  -1.500000
-3.000000 <= 1.500000
-2.000000 <= 1.500000
-1.500000 <= 1.500000
-1.000000 <= 1.500000
0.000000 <= 1.500000
1.000000 <= 1.500000
1.500000 <= 1.500000
2.000000 >  1.500000
3.000000 >  1.500000

Of course, it makes sense to precalculate that final integer value in isGreater() that's used in the comparison operator if your threshold doesn't change.

If you are afraid of undefined behavior in C/C++ in the above code, you can rewrite the code in assembly.

like image 103
Alexey Frunze Avatar answered Oct 13 '22 04:10

Alexey Frunze


If your data is float then you should do your comparisons with floats, e.g.

float arr[10000];
float threshold;
....

float a = arr[20]; // e.g.
if (threshold > a) {....}

otherwise you will have expensive float-int conversions.

like image 33
Paul R Avatar answered Oct 13 '22 02:10

Paul R


Your example shows how bad compiler-generated codes can be :

It loads a value with NEON just to convert it to int, then does a NEON->ARM transfer that causes a pipeline flush resulting in 11~14 cycles wasted.

The best solution would be writing the function completely in hand assembly.

However, there is a simple trick that allows fast float comparisons without typecasting AND truncation:

Threshold positive (exactly as fast as int comparison) :

void example(float * pSrc, float threshold, unsigned int count)
{
  typedef union {
    int ival,
    unsigned int uval,
    float fval
  } unitype;

  unitype v, t;
  if (count==0) return;
  t.fval = threshold;
  do {
    v.fval = *pSrc++;
    if (v.ival < t.ival) {
      // your code here
    }
    else {
      // your code here (optional)
    }
  } while (--count);
}

Threshold negative (1 cycle more per value than int comparison):

void example(float * pSrc, float threshold, unsigned int count)
{
  typedef union {
    int ival,
    unsigned int uval,
    float fval
  } unitype;

  unitype v, t, temp;
  if (count==0) return;
  t.fval = threshold;
  t.uval &= 0x7fffffff;
  do {
    v.fval = *pSrc++;
    temp.uval = v.uval ^ 0x80000000;
    if (temp.ival >= t.ival) {
      // your code here
    }
    else {
      // your code here (optional)
    }
  } while (--count);
}

I think it to be quite a lot faster than the accepted one above. Again, I'm a bit too late.

like image 24
Jake 'Alquimista' LEE Avatar answered Oct 13 '22 04:10

Jake 'Alquimista' LEE