I'm trying to use the following code to emulate a 16-bit half-float in software: <pre class="prettyprint"><code>typedef struct half { unsigned short mantissa:10; unsigned short exponent:5; unsigned short sign:1; } half; unsigned short from_half(half h) { return h.mantissa | h.exponent << 10 | h.sign << 15; } half to_half(unsigned short s) { half result = { s, s >> 10, s >> 15 }; return result; } </code></pre> I set this up so that it could easily be optimized into a move instruction, but lo and behold, in <code>from_half</code>, GCC does the bit-shifting anyway (even at <code>-O3</code>): <pre class="prettyprint"><code>from_half: mov edx, edi mov eax, edi and di, 1023 shr dx, 15 and eax, 31744 movzx edx, dl sal edx, 15 or eax, edx or eax, edi ret </code></pre> while <code>to_half</code> is optimized nicely: <pre class="prettyprint"><code>to_half: mov eax, edi ret </code></pre> Godbolt I've tried different optimization levels (<code>-O1</code>, <code>-O2</code>, <code>-Os</code>) but none optimize it into what I was hoping. Clang does this how I would expect even at <code>-O1</code>: <pre class="prettyprint"><code>from_half: # @from_half mov eax, edi ret to_half: # @to_half mov eax, edi ret </code></pre> Godbolt How can I get GCC to optimize this into a move? Why isn't it optimized that way already?

In addition to Booboo's answer, you can try the following which answers your question <blockquote> How can I get GCC to optimize this into a move? </blockquote> Just cast each shifted bit-field expression to <code>unsigned short</code> <pre class="prettyprint"><code>unsigned short from_half(half h) { return (unsigned short)h.mantissa | (unsigned short)(h.exponent << 10) | (unsigned short)(h.sign << 15); } </code></pre> https://godbolt.org/z/CfZSgC It results in: <pre class="prettyprint"><code>from_half: mov eax, edi ret </code></pre> <blockquote> Why isn't it optimized that way already? </blockquote> I am not sure I have a solid answer on this one. Apparently the intermediate promotion of the bit-fields to <code>int</code> confuses the optimizer... But this is just a guess.

How can I get GCC to optimize this bit-shifting instruction into a move?

I'm trying to use the following code to emulate a 16-bit half-float in software:

typedef struct half
{
    unsigned short mantissa:10;
    unsigned short exponent:5;
    unsigned short sign:1;
} half;

unsigned short from_half(half h)
{
    return h.mantissa | h.exponent << 10 | h.sign << 15;
}

half to_half(unsigned short s)
{
    half result = { s, s >> 10, s >> 15 };
    return result;
}

I set this up so that it could easily be optimized into a move instruction, but lo and behold, in from_half, GCC does the bit-shifting anyway (even at -O3):

from_half:
        mov     edx, edi
        mov     eax, edi
        and     di, 1023
        shr     dx, 15
        and     eax, 31744
        movzx   edx, dl
        sal     edx, 15
        or      eax, edx
        or      eax, edi
        ret

while to_half is optimized nicely:

to_half:
        mov     eax, edi
        ret

Godbolt

I've tried different optimization levels (-O1, -O2, -Os) but none optimize it into what I was hoping.

Clang does this how I would expect even at -O1:

from_half:                              # @from_half
        mov     eax, edi
        ret
to_half:                                # @to_half
        mov     eax, edi
        ret

Godbolt

How can I get GCC to optimize this into a move? Why isn't it optimized that way already?

How do I use optimization in gcc?

GCC has a range of optimization levels, plus individual options to enable or disable particular optimizations. The overall compiler optimization level is controlled by the command line option -On, where n is the required optimization level, as follows: -O0 . (default).

What is gcc optimize?

The compiler optimizes to reduce the size of the binary instead of execution speed. If you do not specify an optimization option, gcc attempts to reduce the compilation time and to make debugging always yield the result expected from reading the source code.

Is gcc an optimizing compiler?

GCC performs nearly all supported optimizations that do not involve a space-speed tradeoff. As compared to -O , this option increases both compilation time and the performance of the generated code.

How do I know if gcc is not optimized?

Compiler specific pragma gcc provides pragma GCC as a way to control temporarily the compiler behavior. By using pragma GCC optimize("O0") , the optimization level can be set to zero, which means absolutely no optimize for gcc.

In addition to Booboo's answer, you can try the following which answers your question

How can I get GCC to optimize this into a move?

Just cast each shifted bit-field expression to unsigned short

unsigned short from_half(half h)
{
    return (unsigned short)h.mantissa | (unsigned short)(h.exponent << 10) | (unsigned short)(h.sign << 15);
}

https://godbolt.org/z/CfZSgC

It results in:

from_half:
        mov     eax, edi
        ret

Why isn't it optimized that way already?

I am not sure I have a solid answer on this one. Apparently the intermediate promotion of the bit-fields to int confuses the optimizer... But this is just a guess.

How can I get GCC to optimize this bit-shifting instruction into a move?

Tags:

c

compiler-optimization

gcc

bit-fields

S.S. Anne

People also ask

Video Answer

1 Answers

Alex Lop.

Recent Activity

Donate For Us

How can I get GCC to optimize this bit-shifting instruction into a move?

Tags:

c

compiler-optimization

gcc

bit-fields

S.S. Anne

People also ask

Video Answer

1 Answers

Alex Lop.

Related questions

Recent Activity

Donate For Us