Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Ensure float to be smaller than exact value

I want to calculate a sum of the following form in C++

float result = float(x1)/y1+float(x2)/y2+....+float(xn)/yn

xi,yi are all integers. The result will be an approximation of the actual value. It is crucial that this approximation is smaller or equal to the actual value. I can assume that all my values are finite and positive. I tried using nextf(,0) as in this code snippet.

cout.precision( 15 );
float a = 1.0f / 3.0f * 10; //3 1/3
float b = 2.0f / 3.0f * 10; //6 2/3
float af = nextafterf( a , 0 );
float bf = nextafterf( b , 0 );
cout << a << endl;
cout << b << endl;
cout << af << endl;
cout << bf << endl;
float sumf = 0.0f;
for ( int i = 1; i <= 3; i++ )
{
    sumf = sumf + bf;
}
sumf = sumf + af;
cout << sumf << endl;

As one can see the correct solution would be 3*6,666... +3.333.. = 23,3333... But as output I get:

3.33333349227905
6.66666698455811
3.33333325386047
6.66666650772095
23.3333339691162

Even though my summands are smaller than what they should represent, their sum is not. In this case applying nextafterf to sumf will give me 23.3333320617676 which is smaller. But does this always work? Is it possible that the rounding error gets so big that nextafterf still leaves me above the correct value?

I know that I could avoid this by implementing a class for fractions and calculating everything exactly. But I'm curious whether it is possible to achieve my goal with floats.

like image 815
Ricardo Avatar asked Mar 15 '23 01:03

Ricardo


2 Answers

Try changing the float rounding mode to FE_TOWARDZERO.

See code example here:

Change floating point rounding mode

like image 110
Support Ukraine Avatar answered Mar 23 '23 09:03

Support Ukraine


My immediate reaction is that the approach you're taking is fundamentally flawed.

The problem is that with floating point numbers, the size of step that nextafter will take will depend on the magnitude of the numbers involved. Let's consider a somewhat extreme example:

#include <iostream>
#include <iomanip>
#include <cmath>

int main() { 
    float num = 1.0e-10f;
    float denom = 1.0e10f;

    std::cout << std::setprecision(7) << num - std::nextafterf(num, 0) << "\n";
    std::cout << std::setprecision(7) << denom - std::nextafterf(denom, 0) << "\n";
}

Result:

6.938894e-018
1024

So, since the numerator is a lot smaller than the denominator, the increment is also much smaller.

The result seems fairly clear: instead of the result being slightly smaller than the input, the result should be quite a bit larger than the input.

If you want to ensure the result is smaller than the correct number, the obvious choice would be to round the numerator down, but the denominator up (i.e. nextafterf(denom, positive_infinity). This way, you get a smaller numerator and a larger denominator, so the result is always smaller than the un-modified version would have been.

like image 30
Jerry Coffin Avatar answered Mar 23 '23 10:03

Jerry Coffin