Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

LLVM and the future of optimization

I realize that LLVM has a long way to go, but theoretically, can the optimizations that are in GCC/ICC/etc. for individual languages be applied to LLVM byte code? If so, does this mean that any language that compiles to LLVM byte code has the potential to be equally as fast? Or are language specific optimizations (before the LLVM bytecode stage) going to always play a large part in optimizing any specific program.

I don't know much about compilers or optimizations (only enough to be dangerous), so I apologize if this question isn't well defined.

like image 993
Andrew Spott Avatar asked Jul 12 '11 22:07

Andrew Spott


2 Answers

In general, no.

For example, in Haskell a common optimization is strictness analysis, which allows the compiler to determine which variables are always in head-normal form and therefore can be forced + inlined without changing program semantics. This is not possible with LLVM.

Explanation: In Haskell, a function (Int, Int) -> Int is more or less equivalent to the type in C:

typedef int (*returns_int)();
struct pair { returns_int first, second};
typedef struct pair *(*returns_pair)();

int function(returns_pair arg);

The compiler can analyze function and determine that it always evaluates its argument and always extracts the contents, transforming the function into this:

int function(int x, int y); // note that it takes *two* arguments now

This is well beyond the capability of LLVM. Maybe, in the future, with some really heavy interprocedural optimization... but realistically speaking, this will not happen in the foreseeable future.

Example 2: There are Java VMs which can transform virtual function calls into direct function calls. However, this isn't something that LLVM can do — because this transformation must be undone dynamically if another class is loaded which implements the same interface.

In general, when you compile a program to LLVM you lose much of the semantic information about the original program. LLVM bytecode is capable of representing any code, but its type system is fairly limited — and your choice of type system affects what optimizations you can do.

like image 176
Dietrich Epp Avatar answered Oct 20 '22 19:10

Dietrich Epp


I addition to Dietrich's excellent answer, I think it is important to appreciate that it is not just the compiler that determines how fast programming languages are. In addition to the various optimizations that a given language may allow/disallow, there is also the matter of how you do certain tasks in various programming languages and what the language lets you do.

For example, it is relatively easy to optimize C code to maximize cache efficiency (reducing slow reads from memory) while this is much harder in Haskell. Pointer hacks are impossible in Java. As is the strategy of allocating a huge chunk of memory and parceling it out by hand.

Thus, some languages will always be slower simply because they don't allow for the same level of optimization. Note that I'm not necessarily saying that this is a bad thing because with that slowness come extremely powerful constructs.

I think that a better way to look at it is that LLVM will allow for a certain set of optimizations to be applied to all languages that compile down to it. Thus, while it will make such languages faster, it will not make them equally fast.

Edit: Pointer hacks in Haskell. So many possibilities ...

like image 41
Abhay Buch Avatar answered Oct 20 '22 19:10

Abhay Buch