Here's a simple C file with an enum definition and a <code>main</code> function: <pre class="prettyprint"><code>enum days {MON, TUE, WED, THU}; int main() { enum days d; d = WED; return 0; } </code></pre> It transpiles to the following LLVM IR: <pre class="prettyprint"><code>define dso_local i32 @main() #0 { %1 = alloca i32, align 4 %2 = alloca i32, align 4 store i32 0, i32* %1, align 4 store i32 2, i32* %2, align 4 ret i32 0 } </code></pre> <code>%2</code> is evidently the <code>d</code> variable, which gets 2 assigned to it. What does <code>%1</code> correspond to if zero is returned directly?

This <code>%1</code> register was generated by clang to handle multiple return statements in a function. Imagine you were writing a function to compute an integer's factorial. Instead of this <pre class="prettyprint"><code>int factorial(int n){ int result; if(n < 2) result = 1; else{ result = n * factorial(n-1); } return result; } </code></pre> You'd probably do this <pre class="prettyprint"><code>int factorial(int n){ if(n < 2) return 1; return n * factorial(n-1); } </code></pre> Why? Because Clang will insert that <code>result</code> variable that holds the return value for you. Yay. That's the reason for that <code>%1</code> variable. Look at the ir for a slightly modified version of your code. Modified code, <pre class="prettyprint"><code>enum days {MON, TUE, WED, THU}; int main() { enum days d; d = WED; if(d) return 1; return 0; } </code></pre> IR, <pre class="prettyprint"><code>define dso_local i32 @main() #0 !dbg !15 { %1 = alloca i32, align 4 %2 = alloca i32, align 4 store i32 0, i32* %1, align 4 store i32 2, i32* %2, align 4, !dbg !22 %3 = load i32, i32* %2, align 4, !dbg !23 %4 = icmp ne i32 %3, 0, !dbg !23 br i1 %4, label %5, label %6, !dbg !25 5: ; preds = %0 store i32 1, i32* %1, align 4, !dbg !26 br label %7, !dbg !26 6: ; preds = %0 store i32 0, i32* %1, align 4, !dbg !27 br label %7, !dbg !27 7: ; preds = %6, %5 %8 = load i32, i32* %1, align 4, !dbg !28 ret i32 %8, !dbg !28 } </code></pre> Now you see <code>%1</code> making itself useful huh? Most functions with a single return statement will have this variable stripped by one of llvm's passes.

Why does this matter — what's the actual problem? I think the deeper answer you're looking for might be: LLVM's architecture is based around fairly simple frontends and many passes. The frontends have to generate correct code, but it doesn't have to be good code. They can do the simplest thing that works. In this case, Clang generates a couple of instructions that turn out not to be used for anything. That's generally not a problem, because some part of LLVM will get rid of superfluous instructions. Clang trusts that to happen. Clang doesn't need to avoid emitting dead code; its implementation may focus on correctness, simplicity, testability, etc.

Why does LLVM allocate a redundant variable?

Tags:

c

llvm

llvm-codegen

Here's a simple C file with an enum definition and a main function:

enum days {MON, TUE, WED, THU};

int main() {
    enum days d;
    d = WED;
    return 0;
}

It transpiles to the following LLVM IR:

define dso_local i32 @main() #0 {
  %1 = alloca i32, align 4
  %2 = alloca i32, align 4
  store i32 0, i32* %1, align 4
  store i32 2, i32* %2, align 4
  ret i32 0
}

%2 is evidently the d variable, which gets 2 assigned to it. What does %1 correspond to if zero is returned directly?

586

asked Jan 06 '20 10:01

macleginn

3 Answers

This %1 register was generated by clang to handle multiple return statements in a function. Imagine you were writing a function to compute an integer's factorial. Instead of this

int factorial(int n){
    int result;
    if(n < 2)
      result = 1;
    else{
      result = n * factorial(n-1);
    }
    return result;
}

You'd probably do this

int factorial(int n){
    if(n < 2)
      return 1;
    return n * factorial(n-1);
}

Why? Because Clang will insert that result variable that holds the return value for you. Yay. That's the reason for that %1 variable. Look at the ir for a slightly modified version of your code.

Modified code,

enum days {MON, TUE, WED, THU};

int main() {
    enum days d;
    d = WED;
    if(d) return 1;
    return 0;
}

IR,

define dso_local i32 @main() #0 !dbg !15 {
    %1 = alloca i32, align 4
    %2 = alloca i32, align 4
    store i32 0, i32* %1, align 4
    store i32 2, i32* %2, align 4, !dbg !22
    %3 = load i32, i32* %2, align 4, !dbg !23
    %4 = icmp ne i32 %3, 0, !dbg !23
    br i1 %4, label %5, label %6, !dbg !25

 5:                                                ; preds = %0
   store i32 1, i32* %1, align 4, !dbg !26
   br label %7, !dbg !26

 6:                                                ; preds = %0
  store i32 0, i32* %1, align 4, !dbg !27
  br label %7, !dbg !27

 7:                                                ; preds = %6, %5
  %8 = load i32, i32* %1, align 4, !dbg !28
  ret i32 %8, !dbg !28
}

Now you see %1 making itself useful huh? Most functions with a single return statement will have this variable stripped by one of llvm's passes.

answered Nov 01 '22 03:11

droptop

Why does this matter — what's the actual problem?

I think the deeper answer you're looking for might be: LLVM's architecture is based around fairly simple frontends and many passes. The frontends have to generate correct code, but it doesn't have to be good code. They can do the simplest thing that works.

In this case, Clang generates a couple of instructions that turn out not to be used for anything. That's generally not a problem, because some part of LLVM will get rid of superfluous instructions. Clang trusts that to happen. Clang doesn't need to avoid emitting dead code; its implementation may focus on correctness, simplicity, testability, etc.

answered Nov 01 '22 03:11

arnt

Because Clang is done with syntax analysis but LLVM hasn't even started with optimization.

The Clang front end has generated IR (Intermediate Representation) and not machine code. Those variables are SSAs (Single Static Assignments); they haven't been bound to registers yet and actually after optimization, never will be because they are redundant.

That code is a somewhat literal representation of the source. It is what clang hands to LLVM for optimization. Basically, LLVM starts with that and optimizes from there. Indeed, for version 10 and x86_64, llc -O2 will eventually generate:

main: # @main
  xor eax, eax
  ret

answered Nov 01 '22 05:11

Olsonist

Related questions
                            
                                On understanding how printf("%d\n", ( { int n; scanf("%d", &n); n*n; } )); works in C
                            
                                using SDK specific API or standard c functions
                            
                                OpenGL triangle adjacency indexing
                            
                                what exactly should PROTECT wrap on assignment?
                            
                                Re-export Shared Library Symbols from Other Library (OS X / POSIX)
                            
                                Does gcc link program with static or dynamic library by default?
                            
                                Do I need to use shm_unlink on a shared memory object?
                            
                                What were the C idioms for polymorphism and inheritance before the concepts were made explicit?
                            
                                Is there an equivalent to .spec files for Clang/LLVM and where can I find a reference?
                            
                                Writing a simple shell in C using fork/execvp
                            
                                *Might* an unsigned char be equal to EOF? [duplicate]
                            
                                void* as generic in C, is it safe?
                            
                                SIMD/SSE: How to check that all vector elements are non-zero
                            
                                What is the displs argument in MPI_Scatterv?
                            
                                Is it reasonable to mark only part of an expression as likely()/unlikely()
                            
                                ARM: reading modbus data through UART fails
                            
                                Is it safe to detect endianess with union?
                            
                                Portable way to retrieve a int32_t passed to variadic function
                            
                                Unable to bypass gcc's -Wconversion
                            
                                Does listen's backlog number include SYN-received connections count in case of TCP in Linux?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With