I am looking into the LLVM system and I have read through the Getting Started documentation. However, some of the nomenclature (and the wording in the clang example) is still a little confusing. The following terms and commands are all part of the compilation process, and I was wondering if someone might be able to explain them a little better for me:
clang -S
vs. clang -c
(I know what -c
does, but how do the results differ?) * (Edit)
At a higher level, I understand the overall compilation process, and can track my way through fairly well, I just get stuck at some points where, for example, I am expecting to see "IR", but instead see "bitcode" or "LLVM assembly" which leads me to think I don't understand them nearly as well as I should!
On the front end, the LLVM compiler infrastructure uses clang — a compiler for programming languages C, C++ and CUDA — to turn source code into an interim format.
LLVM is a backend compiler meant to build compilers on top of it. It deals with optimizations and production of code adapted to the target architecture. CLang is a front end which parses C, C++ and Objective C code and translates it into a representation suitable for LLVM.
LLVM (used to mean "Low Level Virtual Machine" but not anymore) is a compiler infrastructure, written in C++, which is designed for compile-time, link-time, run-time, and "idle-time" optimization of programs written in arbitrary programming languages.
Cross compilation issues On the other hand, Clang/LLVM is natively a cross-compiler, meaning that one set of programs can compile to all targets by setting the -target option.
In general, Clang accepts the same command-line options as GCC. The -c
option (only compile and assemble, do not link) and -S
option (only compile, do not assemble or link) mean the same thing in both.
To quote from another answer of mine on this site:
LLVM IR is typically stored on disk in either text files with .ll extension or in binary files with .bc extension. Conversion between the two is trivial, and you can just use
llvm-dis
for bc -> ll andllvm-as
for ll -> bc. The binary format is more memory-efficient, while the textual format is human-readable.
In additional, there are some commonly-used aliases:
In any case, it all means the same thing, under potentially different representations.
Native assembly is what many typically think about when hearing the term "assembly" - the low-level language with almost 1:1 mapping to your native machine binary, and unlike LLVM assembly, native assembly is very target-dependent (examples are x86 assembly, ARM assembly, etc.). Native assembly is assembled into native binary via an assembler - LLVM does include one, though you can also use other assemblers as well (e.g. gas
).
Native binary - the result of the assembling process - is of course the (only) language the computer really speaks, and after linking it can be loaded into memory and be ran directly on your hardware.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With