Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

what is the right abstraction for compilation unit in LLVM?

in LLVM we have the LLVMContext, which is the unit of storage, and we have the llvm::Module, which is where new symbols (functions and types) are built.

my question is; what is the right llvm abstraction to use for compilation units? is the Module? or is this actually meant for a bigger scope, i.e: a shared library target

It seems to me that a compilation unit must satisfy an all-or-nothing result; either it compiles all its content without errors, or either there are errors and it needs to be fixed and built again before any symbols in the CU are usable. In my head, this is the definition of what a compilation unit should represent

if module is the right abstraction for the CU, how do i present the symbols in other (correctly compiled) Module objects to a new module about to be built, in order that it is able to find those? do i need to add declarations or is there some other expedite way for this?

a point to a relevant line in clang would be of great help

like image 435
lurscher Avatar asked Mar 23 '12 16:03

lurscher


People also ask

What is the LLVM assembly language?

Abstract This document is a reference manual for the LLVM assembly language. LLVM is a Static Single Assignment (SSA) based representation that provides type safety, low-level operations, flexibility, and the capability of representing 'all' high-level languages cleanly.

What is LLVM and how does it work?

LLVM is a compiler infrastructure designed to reduce the time required for and costs of compiling code. It’s equipped with a set of reusable libraries and well-defined interfaces that allow developers to implement an application’s front end in any language they choose and have LLVM generate low-level code from it.

What is an LLVM-based compiler?

An LLVM-based compiler: This is a compiler built partially or completely with the LLVM infrastructure. For example, a compiler might use LLVM for the frontend and backend but use GCC and GNU system libraries to perform the final link.

What instruction sets are supported by LLVM?

At version 3.4, LLVM supports many instruction sets, including ARM, Qualcomm Hexagon, MIPS, Nvidia Parallel Thread Execution (PTX; called NVPTX in LLVM documentation), PowerPC, AMD TeraScale, AMD Graphics Core Next (GCN), SPARC, z/Architecture (called SystemZ in LLVM documentation), x86, x86-64, and XCore.


2 Answers

The Module is the correct abstraction for a compile unit. You can link together modules to do whole program analysis from there.

like image 114
echristo Avatar answered Sep 18 '22 17:09

echristo


this is an on-progress attempt to answer my own question:

The class llvm::Linker has the ability to take multiple Modules and return a single, composite Module back containing all the symbols in the existing modules. After the linking is done and the composite module is created, i'm still not clear what is the rules regarding ownership of the input modules.

In any case, the class should allow you to take an incremental path to growing a module. Say you are trying to implement a REPL, which means that you add new symbols to the global namespace:

The Outline of the REPL would work like:

  • write some function in REPL
  • compile the function as a single module, call it "base"
  • write some more functions in REPL
  • compile the new functions in a new module
  • if the new functions module compiles successfully, link "base" and the new module in a new module, call it "base.2"
  • rinse and repeat

    If you replace a symbol or function by name, you want that older symbols see the overriden version of your symbol. So when you are defining a new function, you need to make sure your getOrInsertFunction is called in the existing "base" module as well as the new one.

like image 27
lurscher Avatar answered Sep 18 '22 17:09

lurscher