I'm trying to use the following code to emulate a 16-bit half-float in software:
typedef struct half
{
unsigned short mantissa:10;
unsigned short exponent:5;
unsigned short sign:1;
} half;
unsigned short from_half(half h)
{
return h.mantissa | h.exponent << 10 | h.sign << 15;
}
half to_half(unsigned short s)
{
half result = { s, s >> 10, s >> 15 };
return result;
}
I set this up so that it could easily be optimized into a move instruction, but lo and behold, in from_half
, GCC does the bit-shifting anyway (even at -O3
):
from_half:
mov edx, edi
mov eax, edi
and di, 1023
shr dx, 15
and eax, 31744
movzx edx, dl
sal edx, 15
or eax, edx
or eax, edi
ret
while to_half
is optimized nicely:
to_half:
mov eax, edi
ret
Godbolt
I've tried different optimization levels (-O1
, -O2
, -Os
) but none optimize it into what I was hoping.
Clang does this how I would expect even at -O1
:
from_half: # @from_half
mov eax, edi
ret
to_half: # @to_half
mov eax, edi
ret
Godbolt
How can I get GCC to optimize this into a move? Why isn't it optimized that way already?
GCC has a range of optimization levels, plus individual options to enable or disable particular optimizations. The overall compiler optimization level is controlled by the command line option -On, where n is the required optimization level, as follows: -O0 . (default).
The compiler optimizes to reduce the size of the binary instead of execution speed. If you do not specify an optimization option, gcc attempts to reduce the compilation time and to make debugging always yield the result expected from reading the source code.
GCC performs nearly all supported optimizations that do not involve a space-speed tradeoff. As compared to -O , this option increases both compilation time and the performance of the generated code.
Compiler specific pragma gcc provides pragma GCC as a way to control temporarily the compiler behavior. By using pragma GCC optimize("O0") , the optimization level can be set to zero, which means absolutely no optimize for gcc.
In addition to Booboo's answer, you can try the following which answers your question
How can I get GCC to optimize this into a move?
Just cast each shifted bit-field expression to unsigned short
unsigned short from_half(half h)
{
return (unsigned short)h.mantissa | (unsigned short)(h.exponent << 10) | (unsigned short)(h.sign << 15);
}
https://godbolt.org/z/CfZSgC
It results in:
from_half:
mov eax, edi
ret
Why isn't it optimized that way already?
I am not sure I have a solid answer on this one. Apparently the intermediate promotion of the bit-fields to int
confuses the optimizer... But this is just a guess.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With