Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why does LLVM allocate a redundant variable?

Here's a simple C file with an enum definition and a main function:

enum days {MON, TUE, WED, THU};

int main() {
    enum days d;
    d = WED;
    return 0;
}

It transpiles to the following LLVM IR:

define dso_local i32 @main() #0 {
  %1 = alloca i32, align 4
  %2 = alloca i32, align 4
  store i32 0, i32* %1, align 4
  store i32 2, i32* %2, align 4
  ret i32 0
}

%2 is evidently the d variable, which gets 2 assigned to it. What does %1 correspond to if zero is returned directly?

like image 586
macleginn Avatar asked Jan 06 '20 10:01

macleginn


People also ask

What is Alloca in LLVM?

The 'alloca' instruction is commonly used to represent automatic variables that must have an address available. More you can find here. Some notes from llvm IR book: The contents of an entire LLVM file, either assembly or bitcode, are said to define an LLVM module. The module is the LLVM IR top-level data structure.

How does LLVM compiler work?

How A LLVM Compiler Works. On the front end, the LLVM compiler infrastructure uses clang — a compiler for programming languages C, C++ and CUDA — to turn source code into an interim format. Then the LLVM clang code generator on the back end turns the interim format into final machine code.

What is LLVM optimization?

DESCRIPTION. The opt command is the modular LLVM optimizer and analyzer. It takes LLVM source files as input, runs the specified optimizations or analyses on it, and then outputs the optimized file.

Why is LLVM popular?

What makes it so popular is that its modular design allows its functionality to be adapted and reused very easily.


3 Answers

This %1 register was generated by clang to handle multiple return statements in a function. Imagine you were writing a function to compute an integer's factorial. Instead of this

int factorial(int n){
    int result;
    if(n < 2)
      result = 1;
    else{
      result = n * factorial(n-1);
    }
    return result;
}

You'd probably do this

int factorial(int n){
    if(n < 2)
      return 1;
    return n * factorial(n-1);
}

Why? Because Clang will insert that result variable that holds the return value for you. Yay. That's the reason for that %1 variable. Look at the ir for a slightly modified version of your code.

Modified code,

enum days {MON, TUE, WED, THU};

int main() {
    enum days d;
    d = WED;
    if(d) return 1;
    return 0;
}

IR,

define dso_local i32 @main() #0 !dbg !15 {
    %1 = alloca i32, align 4
    %2 = alloca i32, align 4
    store i32 0, i32* %1, align 4
    store i32 2, i32* %2, align 4, !dbg !22
    %3 = load i32, i32* %2, align 4, !dbg !23
    %4 = icmp ne i32 %3, 0, !dbg !23
    br i1 %4, label %5, label %6, !dbg !25

 5:                                                ; preds = %0
   store i32 1, i32* %1, align 4, !dbg !26
   br label %7, !dbg !26

 6:                                                ; preds = %0
  store i32 0, i32* %1, align 4, !dbg !27
  br label %7, !dbg !27

 7:                                                ; preds = %6, %5
  %8 = load i32, i32* %1, align 4, !dbg !28
  ret i32 %8, !dbg !28
}

Now you see %1 making itself useful huh? Most functions with a single return statement will have this variable stripped by one of llvm's passes.

like image 74
droptop Avatar answered Nov 01 '22 03:11

droptop


Why does this matter — what's the actual problem?

I think the deeper answer you're looking for might be: LLVM's architecture is based around fairly simple frontends and many passes. The frontends have to generate correct code, but it doesn't have to be good code. They can do the simplest thing that works.

In this case, Clang generates a couple of instructions that turn out not to be used for anything. That's generally not a problem, because some part of LLVM will get rid of superfluous instructions. Clang trusts that to happen. Clang doesn't need to avoid emitting dead code; its implementation may focus on correctness, simplicity, testability, etc.

like image 29
arnt Avatar answered Nov 01 '22 03:11

arnt


Because Clang is done with syntax analysis but LLVM hasn't even started with optimization.

The Clang front end has generated IR (Intermediate Representation) and not machine code. Those variables are SSAs (Single Static Assignments); they haven't been bound to registers yet and actually after optimization, never will be because they are redundant.

That code is a somewhat literal representation of the source. It is what clang hands to LLVM for optimization. Basically, LLVM starts with that and optimizes from there. Indeed, for version 10 and x86_64, llc -O2 will eventually generate:

main: # @main
  xor eax, eax
  ret
like image 1
Olsonist Avatar answered Nov 01 '22 05:11

Olsonist