Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Should I worry about precision when I use C++ mathematical functions with integers?

For example, The code below will give undesirable result due to precision of floating point numbers.

double a = 1 / 3.0;
int b = a * 3;      // b will be 0 here

I wonder whether similar problems will show up if I use mathematical functions. For example

int a = sqrt(4);       // Do I have guarantee that I will always get 2 here?
int b = log2(8);       // Do I have guarantee that I will always get 3 here?

If not, how to solve this problem?

Edit:

Actually, I came across this problem when I was programming for an algorithm task. There I want to get

the largest integer which is power of 2 and is less than or equal to integer N

So round function can not solve my problem. I know I can solve this problem through a loop, but it seems not very elegant.

I want to know if

int a = pow(2, static_cast<int>(log2(N)));

can always give correct result. For example if N==8, is it possible that log2(N) gives me something like 2.9999999999999 and the final result become 4 instead of 8?

like image 960
delphifirst Avatar asked Feb 11 '23 19:02

delphifirst


1 Answers

Inaccurate operands vs inaccurate results

I wonder whether similar problems will show up if I use mathematical functions.

Actually, the problem that could prevent log2(8) to be 3 does not exist for basic operations (including *). But it exists for the log2 function.

You are confusing two different issues:

double a = 1 / 3.0;
int b = a * 3;      // b will be 0 here

In the example above, a is not exactly 1/3, so it is possible that a*3 does not produce 1.0. The product could have happened to round to 1.0, it just doesn't. However, if a somehow had been exactly 1/3, the product of a by 3 would have been exactly 1.0, because this is how IEEE 754 floating-point works: the result of basic operations is the nearest representable value to the mathematical result of the same operation on the same operands. When the exact result is representable as a floating-point number, then that representation is what you get.

Accuracy of sqrt and log2

sqrt is part of the “basic operations”, so sqrt(4) is guaranteed always, with no exception, in an IEEE 754 system, to be 2.0.

log2 is not part of the basic operations. The result of an implementation of this function is not guaranteed by the IEEE 754 standard to be the closest to the mathematical result. It can be another representable number further away. So without more hypotheses on the log2 function that you use, it is impossible to tell what log2(8.0) can be.

However, most implementations of reasonable quality for elementary functions such as log2 guarantee that the result of the implementation is within 1 ULP of the mathematical result. When the mathematical result is not representable, this means either the representable value above or the one below (but not necessarily the closest one of the two). When the mathematical result is exactly representable (such as 3.0), then this representation is still the only one guaranteed to be returned.

So about log2(8), the answer is “if you have a reasonable quality implementation of log2, you can expect the result to be 3.0`”.

Unfortunately, not every implementation of every elementary function is a quality implementation. See this blog post, caused by a widely used implementation of pow being inaccurate by more than 1 ULP when computing pow(10.0, 2.0), and thus returning 99.0 instead of 100.0.

Rounding to the nearest integer

Next, in each case, you assign the floating-point to an int with an implicit conversion. This conversion is defined in the C++ standard as truncating the floating-point values (that is, rounding towards zero). If you expect the result of the floating-point computation to be an integer, you can round the floating-point value to the nearest integer before assigning it. It will help obtain the desired answer in all cases where the error does not accumulate to a value larger than 1/2:

int b = std::nearbyint(log2(8.0));

To conclude with a straightforward answer to the question the the title: yes, you should worry about accuracy when using floating-point functions for the purpose of producing an integral end-result. These functions do not come even with the guarantees that basic operations come with.

like image 84
Pascal Cuoq Avatar answered Feb 15 '23 09:02

Pascal Cuoq