Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why does using the ternary operator to return a string generate considerably different code from returning in an equivalent if/else block?

I was playing with the Compiler Explorer and I stumbled upon an interesting behavior with the ternary operator when using something like this:

std::string get_string(bool b) {     return b ? "Hello" : "Stack-overflow"; } 

The compiler generated code for this (clang trunk, with -O3) is this:

get_string[abi:cxx11](bool):                 # @get_string[abi:cxx11](bool)         push    r15         push    r14         push    rbx         mov     rbx, rdi         mov     ecx, offset .L.str         mov     eax, offset .L.str.1         test    esi, esi         cmovne  rax, rcx         add     rdi, 16 #< Why is the compiler storing the length of the string         mov     qword ptr [rbx], rdi         xor     sil, 1         movzx   ecx, sil         lea     r15, [rcx + 8*rcx]         lea     r14, [rcx + 8*rcx]         add     r14, 5 #< I also think this is the length of "Hello" (but not sure)         mov     rsi, rax         mov     rdx, r14         call    memcpy #< Why is there a call to memcpy         mov     qword ptr [rbx + 8], r14         mov     byte ptr [rbx + r15 + 21], 0         mov     rax, rbx         pop     rbx         pop     r14         pop     r15         ret .L.str:         .asciz  "Hello"  .L.str.1:         .asciz  "Stack-Overflow" 

However, the compiler generated code for the following snippet is considerably smaller and with no calls to memcpy, and does not care about knowing the length of both strings at the same time. There are 2 different labels that it jumps to

std::string better_string(bool b) {     if (b)     {         return "Hello";     }     else     {         return "Stack-Overflow";     } } 

The compiler generated code for the above snippet (clang trunk with -O3) is this:

better_string[abi:cxx11](bool):              # @better_string[abi:cxx11](bool)         mov     rax, rdi         lea     rcx, [rdi + 16]         mov     qword ptr [rdi], rcx         test    sil, sil         je      .LBB0_2         mov     dword ptr [rcx], 1819043144         mov     word ptr [rcx + 4], 111         mov     ecx, 5         mov     qword ptr [rax + 8], rcx         ret .LBB0_2:         movabs  rdx, 8606216600190023247         mov     qword ptr [rcx + 6], rdx         movabs  rdx, 8525082558887720019         mov     qword ptr [rcx], rdx         mov     byte ptr [rax + 30], 0         mov     ecx, 14         mov     qword ptr [rax + 8], rcx         ret 

The same result is when I use the ternary operator with:

std::string get_string(bool b) {     return b ? std::string("Hello") : std::string("Stack-Overflow"); } 

I would like to know why the ternary operator in the first example generates that compiler code. I believe that the culprit lies within the const char[].

P.S: GCC does calls to strlen in the first example but Clang doesn't.

Link to the Compiler Explorer example: https://godbolt.org/z/Exqs6G

Thank you for your time!

sorry for the wall of code

like image 763
Marius T Avatar asked Aug 09 '20 14:08

Marius T


People also ask

Can ternary operator return string?

Ternary operator values The values part of the ternary operator in the above example is this: “This is an even number!” : “This is an odd number!”; In the example above, if the condition evaluates to true then the ternary operator will return the string value “This is an even number!”.

What is the advantage of ternary operator over if else statement?

The only "advantage" is that you can use the ternary operator in an expression (eg. function arguments), making for terser code. using an if , you'd duplicate the full expression.

What is ternary operator How can it be used instead of if else statement show it with the help of an example?

A ternary operator lets you assign one value to the variable if the condition is true, and another value if the condition is false. The if else block example from above could now be written as shown in the example below. var num = 4, msg = ""; msg = (num === 4) ?

Is ternary faster than if else C?

It is not faster. There is one difference when you can initialize a constant variable depending on some expression: const int x = (a<b) ?


1 Answers

The overarching difference here is that the first version is branchless.

16 isn’t the length of any string here (the longer one, with NUL, is only 15 bytes long); it’s an offset into the return object (whose address is passed in RDI to support RVO), used to indicate that the small-string optimization is in use (note the lack of allocation). The lengths are 5 or 5+1+8 stored in R14, which is stored in the std::string as well as passed to memcpy (along with a pointer chosen by CMOVNE) to load the actual string bytes.

The other version has an obvious branch (although part of the std::string construction has been hoisted above it) and actually does have 5 and 14 explicitly, but is obfuscated by the fact that the string bytes have been included as immediate values (expressed as integers) of various sizes.

As for why these three equivalent functions produce two different versions of the generated code, all I can offer is that optimizers are iterative and heuristic algorithms; they don’t reliably find the same “best” assembly independently of their starting point.

like image 182
Davis Herring Avatar answered Sep 22 '22 10:09

Davis Herring