I have observed that GCC's C++ compiler generates the following assembler code: <pre class="prettyprint"><code>sub $0xffffffffffffff80,%rsp </code></pre> This is equivalent to <pre class="prettyprint"><code>add $0x80,%rsp </code></pre> i.e. remove 128 bytes from the stack. Why does GCC generate the first sub variant and not the add variant? The add variant seems way more natural to me than to exploit that there is an underflow. This only occurred once in a quite large code base. I have no minimal C++ code example to trigger this. I am using GCC 7.5.0

Try assembling both and you'll see why. <pre class="prettyprint"><code> 0: 48 83 ec 80 sub $0xffffffffffffff80,%rsp 4: 48 81 c4 80 00 00 00 add $0x80,%rsp </code></pre> The <code>sub</code> version is three bytes shorter. This is because the <code>add</code> and <code>sub</code> immediate instructions on x86 has two forms. One takes an 8-bit sign-extended immediate, and the other a 32-bit sign-extended immediate. See https://www.felixcloutier.com/x86/add; the relevant forms are (in Intel syntax) <code>add r/m64, imm8</code> and <code>add r/m64, imm32</code>. The 32-bit one is obviously three bytes larger. The number <code>0x80</code> can't be represented as an 8-bit signed immediate; since the high bit is set, it would sign-extend to <code>0xffffffffffffff80</code> instead of the desired <code>0x0000000000000080</code>. So <code>add $0x80, %rsp</code> would have to use the 32-bit form <code>add r/m64, imm32</code>. On the other hand, <code>0xffffffffffffff80</code> would be just what we want if we subtract instead of adding, and so we can use <code>sub r/m64, imm8</code>, giving the same effect with smaller code. I wouldn't really say it's "exploiting an underflow". I'd just interpret it as <code>sub $-0x80, %rsp</code>. The compiler is just choosing to emit <code>0xffffffffffffff80</code> instead of the equivalent <code>-0x80</code>; it doesn't bother to use the more human-readable version. Note that 0x80 is actually the only possible number for which this trick is relevant; it's the unique 8-bit number which is its own negative mod 2^8. Any smaller number can just use <code>add</code>, and any larger number has to use 32 bits anyway. In fact, 0x80 is the only reason that we couldn't just omit <code>sub r/m, imm8</code> from the instruction set and always use <code>add</code> with negative immediates in its place. I guess a similar trick does come up if we want to do a 64-bit add of <code>0x0000000080000000</code>; <code>sub</code> will do it, but <code>add</code> can't be used at all, as there is no <code>imm64</code> version; we'd have to load the constant into another register first.

Why GCC generates strange way to move stack pointer

Tags:

c++

x86

gcc

assembly

stack-pointer

I have observed that GCC's C++ compiler generates the following assembler code:

sub    $0xffffffffffffff80,%rsp

This is equivalent to

add    $0x80,%rsp

i.e. remove 128 bytes from the stack.

Why does GCC generate the first sub variant and not the add variant? The add variant seems way more natural to me than to exploit that there is an underflow.

This only occurred once in a quite large code base. I have no minimal C++ code example to trigger this. I am using GCC 7.5.0

409

asked Dec 22 '21 17:12

Heygard Flisch

Video Answer

1 Answers

Try assembling both and you'll see why.

   0:   48 83 ec 80             sub    $0xffffffffffffff80,%rsp
   4:   48 81 c4 80 00 00 00    add    $0x80,%rsp

The sub version is three bytes shorter.

This is because the add and sub immediate instructions on x86 has two forms. One takes an 8-bit sign-extended immediate, and the other a 32-bit sign-extended immediate. See https://www.felixcloutier.com/x86/add; the relevant forms are (in Intel syntax) add r/m64, imm8 and add r/m64, imm32. The 32-bit one is obviously three bytes larger.

The number 0x80 can't be represented as an 8-bit signed immediate; since the high bit is set, it would sign-extend to 0xffffffffffffff80 instead of the desired 0x0000000000000080. So add $0x80, %rsp would have to use the 32-bit form add r/m64, imm32. On the other hand, 0xffffffffffffff80 would be just what we want if we subtract instead of adding, and so we can use sub r/m64, imm8, giving the same effect with smaller code.

I wouldn't really say it's "exploiting an underflow". I'd just interpret it as sub $-0x80, %rsp. The compiler is just choosing to emit 0xffffffffffffff80 instead of the equivalent -0x80; it doesn't bother to use the more human-readable version.

Note that 0x80 is actually the only possible number for which this trick is relevant; it's the unique 8-bit number which is its own negative mod 2^8. Any smaller number can just use add, and any larger number has to use 32 bits anyway. In fact, 0x80 is the only reason that we couldn't just omit sub r/m, imm8 from the instruction set and always use add with negative immediates in its place. I guess a similar trick does come up if we want to do a 64-bit add of 0x0000000080000000; sub will do it, but add can't be used at all, as there is no imm64 version; we'd have to load the constant into another register first.

160

answered Oct 19 '22 04:10

Nate Eldredge

Related questions
                            
                                Midpoint thick ellipse drawing algorithm
                            
                                Using incomplete type in a member function of a class template
                            
                                Adding extra constness causes compiler error
                            
                                Why is reading uint8_t as hex not working as expected?
                            
                                Why does the output format of an int8_t use 4 bytes?
                            
                                Why does brace initialization assignment fill variables with garbage?
                            
                                Why is sorting not taking O(n log (n)) in time
                            
                                Why is this C++ fold expression valid?
                            
                                How deep can I #define?
                            
                                Why does Expected<T> in LLVM implement two constructors for Expected<T>&&?
                            
                                Is casting to simd-type undefined behaviour in C++? [duplicate]
                            
                                Why are template arguments not deduced automatically in that example?
                            
                                Chromium Edge-based WebView2 does not work
                            
                                Leaving member functions undefined
                            
                                In C++ what is the point of std::array if the size has to be determined at compile time?
                            
                                Sequence points - is this gcc warning a bug?
                            
                                Anything in std::atomic is wait-free?
                            
                                Why does shrink_to_fit (if the request is fulfilled) cause reallocation?
                            
                                Is it safe to delete a nullptr twice in C++?
                            
                                When a template is instantiated?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With