Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How does this math rounding function work?

Tags:

c

math

rounding

Can anyone explain what this function does?

static inline void round_to_zero(volatile float *f)
{
  *f += 1e-18;
  *f -= 1e-18;
}

I mean apart from add 1e-18 and subtract it again, I understand that. But I don't understand what effect it will have on a float passed to it. The reason I am trying to understand it is that I am using doubles in some code that uses this function (which I have converted from floats). Its audio code, and the above function comes from this library:

https://github.com/swh/lv2/blob/master/include/ladspa-util.h

I am wondering if it will work on a double as is, or needs to be modified for the extra precision a double has. I suspect this knocks off the last few bits of data, erasing them from the float if they are there, although I don't quite understand how. But I imagine if this is what it does, I will need to change the exponent to suit a double.

TIA, Pete

like image 724
Pete Avatar asked Nov 01 '22 16:11

Pete


2 Answers

The following code demonstrates what that function does.

int main( void )
{
    float a;

    a = -1.0;
    a /= 1e100;
    printf( "%f\n", a );

    round_to_zero( &a );
    printf( "%f\n", a );
}

The thing you need to know is that IEEE-754 floating point numbers have two possible values for 0. There's a positive 0 and a negative 0. The round_to_zero function converts negative 0 to positive 0.

The value 1e-18 is approximately 1 lsb for the double precision number 1.0. So I don't think any modifications are necessary to use that function with double (other than changing the argument type, of course).

like image 64
user3386109 Avatar answered Nov 15 '22 06:11

user3386109


Thought I should come back to this to add the following details.

While the answer referring to converting a negative zero to positive is true and was useful to me, theres more to it than that.

Adding 1e-18 and then subtracting it from a float does indeed wipe out very low numbers from the float. This is used in audio applications because the filters can recirculate the small floats through functions which continually divide the floats, resulting in an ever smaller number. Once the number becomes denormalised (as Caskey mentioned), processing speed for that number in many cpus (x86 included) becomes up to 100x slower.

By adding a much larger number than the denormal size number for that data type, you wipe out the tiny value stored in the type. Subtracting the same larger value results in the type holding a zero, which does not impact processing speed if processed. The reason you wipe out the tiny value is that the Significand precision in the type is not large enough to hold both the very tiny value, and the larger value you just added.

For example:

Start with an audio sample with a value of 1.0f.

Put this through a function 40 times which divides by 10, resulting in a value of 1e-40.

v = 0.0100000 e-38 (the float type has roughly 8 decimals of precision, and an exponent of up to 38, so looks in memory as I have written it here).

This is now a denormal value for a float type, and will cause a cpu to process it very slowly. How to get rid of the slow down? Make it zero. So:

Add 1e-18; result: 1.00000000 e-18 (notice the original 1e-40 is too small to be represented in the 8 digit significand if its already holding the much larger 1e-18 value).

Then subtract the 1e-18 value: 0.00000000 e-0

Hence we produce zero, wiping out the original denormal value, and our cpu thanks us.

like image 42
Pete Avatar answered Nov 15 '22 07:11

Pete