So recently, in an attempt to hone my assembly skills, I wrote a VERY simple compiler for a toy language in C++. It runs single pass, and directly emits code during the parsing phase to several string streams, each representing a section of the code (i.e one representssection .bss
, while others represent .data
and .text
). Afterwards, these string streams are written to a file, and I use NASM and gcc to assemble and link them. I know that this single-pass approach is horribly inefficient, but again, this was more of an exercise in understanding the code-generation stage than anything else. Anyway, I would like to modify my code to directly emit LLVM IL instead of raw assembly, again as a learning exercise. Is there any introductory level guide to LLVM IL? Or, even better, a tool to determine the equivalent IL code for a line of assembly? I looked, and I only found the complete spec, which is WAY more information than I need.
LLVM is written in C++ and is designed for compile-time, link-time, run-time, and "idle-time" optimization.
LLVM is a language-agnostic compiler toolchain that handles program optimization and code generation. It is based on its own internal representation, called LLVM IR, which is then transformed into machine code.
LLVM is an SSA based representation that provides type safety, low-level operations, flexibility, and the capability of representing 'all' high-level languages cleanly. It is the common code representation used throughout all phases of the LLVM compilation strategy. Introduction.
The backend of LLVM features a target-independent code generator that may create output for several types of target CPUs — including X86, PowerPC, ARM, and SPARC. The backend may also be used to generate code targeted at SPUs of the Cell processor or GPUs to support the execution of compute kernels.
The LLVM IR language reference is available here. Note that it's a detailed reference page, not a tutorial. There is no direct 1-to-1 correspondence between x86 assembly and LLVM IR, although since LLVM IR is higher-level and more general than x86 assembly it should not be too difficult to adapt a compiler from emitting x86 to emitting LLVM IR.
The official LLVM documentation comes with a detailed tutorial which is absolutely the best starting place for you - it walks through creating a toy compiler from a simplistic high-level programming language to LLVM IR. By working through it you will learn many of the key concepts of LLVM and will then be able to effectively use the aforementioned language reference.
If you find any problems with the tutorial, please report them to the LLVM bug tracker or mailing list. The tutorial is expected to be functional, and any reported problem will be fixed.
Another good beginning resource for understanding LLVM IR is the online demo page. It allows you to compile chunks of C code down to LLVM IR online (without installing anything), and should be very instrumental in understanding how basic programming constructs can be represented in LLVM IR.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With