Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Can a static_cast<float> from double, assigned to double be optimized away?

I stumbled on a function that I think is unnecessary, and generally scares me:

float coerceToFloat(double x) {
    volatile float y = static_cast<float>(x);
    return y;
}

Which is then used like this:

// double x
double y = coerceToFloat(x);

Is this ever any different from just doing this?:

double y = static_cast<float>(x);

The intention seems to be to just strip the double down to single precision. It smells like something written out of extreme paranoia.

like image 370
Ben Avatar asked Nov 02 '18 12:11

Ben


People also ask

Can float and double be used interchangeably?

Float and double are both widely used data types in programming that have the ability to store decimal or floating-point​ numbers. The only difference between them is the precision.

Is float more efficient than double?

double has 2x more precision than float. float is a 32-bit IEEE 754 single precision Floating Point Number – 1 bit for the sign, 8 bits for the exponent, and 23* for the value.

Why would you use float over double?

A float uses less memory than a double, so if you don't need your number to be the size of a double, you might as well use a float since it will take up less memory. Just like you wouldn't use a bus to drive yourself and a friend to the beach... you would be far better off going in a 2 seater car.

Does float use less memory than double?

Floats are very generally faster than doubles and take less memory (FPU features vary).


3 Answers

static_cast<float>(x) is required to remove any excess precision, producing a float. While the C++ standard generally permits implementations to retain excess floating-point precision in expressions, that precision must be removed by cast and assignment operators.

The license to use greater precision is in C++ draft N4659 clause 8, paragraph 13:

The values of the floating operands and the results of floating expressions may be represented in greater precision and range than that required by the type; the types are not changed thereby.64

Footnote 64 says:

The cast and assignment operators must still perform their specific conversions as described in 8.4, 8.2.9 and 8.18.

like image 165
Eric Postpischil Avatar answered Oct 07 '22 12:10

Eric Postpischil


Following up on the comment by @NathanOliver -- compilers are allowed to do floating-point math at higher precision than the types of the operands require. Typically on x86 that means that they do everything as 80-bit values, because that's the most efficient in the hardware. It's only when a value is stored that it has to be reverted to the actual precision of the type. And even then, most compilers by default will do optimizations that violate this rule, because forcing that change in precision slows down the floating-point operations. Most of the time that's okay, because the extra precision isn't harmful. If you're a stickler, you can use a command-line switch to force the compiler to honor that storage rule, and you might see that your floating-point calculations are significantly slower.

In that function, marking the variable volatile tells the compiler that it cannot elide storing that value; that, in turn, means that it has to reduce the precision of the incoming value to match the type that it's being stored in. So the hope is that this would force truncation.

And, no, writing a cast instead of calling that function is not the same, because the compiler (in its non-conforming mode) can skip the assignment to y if it determines that it can generate better code without storing the value, and it can skip the truncation as well. Keep in mind that the goal is to run floating-point calculations as fast as possible, and having to deal with niggling rules about reducing precision for intermediate values just slows things down.

In most cases, running flat-out by skipping intermediate truncations is what serious floating-point applications need. The rule requiring truncation on storage is more of a hope than a realistic requirement.

On a side note, Java originally required that all floating-point math be done at the exact precision required by the types involved. You can do that on Intel hardware by telling it not to extend fp types to 80 bits. This was met with loud complaints from number crunchers because that makes calculations much slower. Java soon changed to the notion of "strict" fp and "non-strict" fp, and serious number crunching uses non-strict, i.e., make it as fast as the hardware supports. People who thoroughly understand floating-point math (that does not include me) want speed, and know how to cope with the differences in precision that result.

like image 11
Pete Becker Avatar answered Oct 07 '22 11:10

Pete Becker


Some compilers have this concept of "extended precision", where doubles carry with them more than 64 bits of data. This results in floating point calculations that doesn't match the IEEE standard.

The above code could be an attempt to prevent extended precision flags on the compiler from removing the precision loss. Such flags explicitly violate the precision assumptions of doubles and floating point values. It seems plausible that they wouldn't do so on a volatile variable.

like image 8
Yakk - Adam Nevraumont Avatar answered Oct 07 '22 11:10

Yakk - Adam Nevraumont