I've often noticed gcc converting multiplications into shifts in the executable. Something similar might happen when multiplying an int
and a float
. For example, 2 * f
, might simply increment the exponent of f
by 1, saving some cycles. Do the compilers, perhaps if one requests them to do so (e.g. via -ffast-math
), in general, do it?
Are compilers generally smart enough to do this, or do I need to do this myself using the scalb*()
or ldexp()/frexp()
function family?
Floating-point decimal values generally do not have an exact binary representation. This is a side effect of how the CPU represents floating point data. For this reason, you may experience some loss of precision, and some floating-point operations may produce unexpected results.
A float has 23 bits of mantissa, and 2^23 is 8,388,608. 23 bits let you store all 6 digit numbers or lower, and most of the 7 digit numbers. This means that floating point numbers have between 6 and 7 digits of precision, regardless of exponent.
The result of multiplying a float and an integer is always going to be a float. Copied! You can use the round() function if you need to round the result to N digits precision after the decimal point.
To do so, floating-point uses a biased exponent, which is the original exponent plus a constant bias. 32-bit floating-point uses a bias of 127. For example, for the exponent 7, the biased exponent is 7 + 127 = 134 = 100001102. For the exponent −4, the biased exponent is: −4 + 127 = 123 = 011110112.
For example, 2 * f, might simply increment the exponent of f by 1, saving some cycles.
This simply isn't true.
First you have too many corner cases such as zero, infinity, Nan, and denormals. Then you have the performance issue.
The misunderstanding is that incrementing the exponent is not faster than doing a multiplication.
If you look at the hardware instructions, there is no direct way to increment the exponent. So what you need to do instead is:
There is generally a medium to large latency for moving data between the integer and floating-point execution units. So in the end, this "optimization" becomes much worse than a simple floating-point multiply.
So the reason why the compiler doesn't do this "optimization" is because it isn't any faster.
On modern CPUs, multiplication typically has one-per-cycle throughput and low latency. If the value is already in a floating point register, there's no way you'll beat that by juggling it around to do integer arithmetic on the representation. If it's in memory to begin with, and if you're assuming neither the current value nor the correct result would be zero, denormal, nan, or infinity, then it might be faster to perform something like
addl $0x100000, 4(%eax) # x86 asm example
to multiply by two; the only time I could see this being beneficial is if you're operating on a whole array of floating-point data that's bounded away from zero and infinity, and scaling by a power of two is the only operation you'll be performing (so you don't have any existing reason to be loading the data into floating point registers).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With