Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Which code in LLVM IR runs before "main()"?

Does anyone know the general rule for exactly which LLVM IR code will be executed before main?

When using Clang++ 3.6, it seems that global class variables have their constructors called via a function in the ".text.startup" section of the object file. For example:

define internal void @__cxx_global_var_init() section ".text.startup" {
  call void @_ZN7MyClassC2Ev(%class.MyClass* @M)
  ret void
}

From this example, I'd guess that I should be looking for exactly those IR function definitions that specify section ".text.startup".

I have two reasons to suspect my theory is correct:

  • I don't see anything else in my LLVM IR file (.ll) suggesting that the global object constructors should be run first, if we assume that LLVM isn't sniffing for C++ -specific function names like "__cxx_global_var_init". So section ".text.startup" is the only obvious means of saying that code should run before main(). But even if that's correct, we've identified a sufficient condition for causing a function to run before main(), but haven't shown that it's the only way in LLVM IR to cause a function to run before main().

  • The Gnu linker, in some cases, will use the first instruction in the .text section to be the program entry point. This article on Raspberry Pi programming describes causing the .text.startup content to be the first body of code appearing in the program's .text section, as a means of causing the .text.startup code to run first.

Unfortunately I'm not finding much else to support my theory:

  • When I grep the LLVM 3.6 source code for the string ".startup", I only find it in the CLang-specific parts of the LLVM code. For my theory to be correct, I would expect to have found that string in other parts of the LLVM code as well; in particular, parts outside of the C++ front-end.

  • This article on data initialization in C++ seems to hint at ".text.startup" having a special role, but it doesn't come right out and say that the Linux program loader actually looks for a section of that name. Even if it did, I'd be surprised to find a potentially Linux-specific section name carrying special meaning in platform-neutral LLVM IR.

  • The Linux 3.13.0 source code doesn't seem to contain the string ".startup", suggesting to me that the program loader isn't sniffing for a section with the name ".text.startup".

like image 470
Christian Convey Avatar asked Jun 17 '15 14:06

Christian Convey


People also ask

What is LLVM IR code?

LLVM can provide the middle layers of a complete compiler system, taking intermediate representation (IR) code from a compiler and emitting an optimized IR. This new IR can then be converted and linked into machine-dependent assembly language code for a target platform.

What is Align 4 in LLVM?

The align 4 ensures that the address will be a multiple of 4 store i32 0, i32* %1. This sets the 32 bit integer pointed to by %1 to the 32 bit value 0. It's like saying *x = 1 in C++ ret i32 0. This returns from the function with a 32 bit return value of 0.

Does LLVM compile to machine code?

LLVM is a language-agnostic compiler toolchain that handles program optimization and code generation. It is based on its own internal representation, called LLVM IR, which is then transformed into machine code.


1 Answers

The answer is pretty easy - LLVM is not executing anything behind the scenes. It's a job of the C runtime (CRT) to perform all necessary preparations before running main(). This includes (but not limited to) to static ctors and similar things. The runtime is usually informed about these objects via addresses of constructores being emitted in the special sections (e.g. .init_array or .ctors). See e.g. http://wiki.osdev.org/Calling_Global_Constructors for more information.

like image 116
Anton Korobeynikov Avatar answered Nov 02 '22 06:11

Anton Korobeynikov