Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to see lowered c++

Tags:

c++

I'm trying to improve my understanding of how C++ actually works. Is there a way to see how the compiler lowers my code into something simpler? For example, I'd like to see how all the copy constructors are called, how overloaded function calls have been resolved, all the template expansion and instantiation complete, etc. Right now I'm learning about how C++ compilers interpret my code through experimentation, but it'd be nice just to see a lowered form of my code, even if it is very ugly. I'm looking for something analogous to g++ -E, which shows the result of the preprocessor, but for C++.

Edit: I should have added that I'm not looking for a disassembler. There's a huge gulf between C++ source code and assembled code. Inside this gulf are complicated things like template meta-programming and all sorts of implicit calls to operator methods (assignments! casts! constructors! ...) as well as heavily overloaded functions with very complicated resolution rules, etc. I'm looking for tools to help me understand how my code is interpreted by the C++ compiler. Right now, the only thing I can do is try little experiments and piecemeal put together an understanding of what the compiler is doing. I'd like to see more detail on what's going on. It would help greatly, for example, in debugging template metaprogramming problems.

like image 787
Bryan Catanzaro Avatar asked Aug 26 '11 15:08

Bryan Catanzaro


2 Answers

At the moment, I think that your best bet is Clang (you can try some simple code on the Try Out LLVM page).

When compiling C, C++ or Obj-C with Clang/LLVM, you may ask the compiler to emit the Intermediate Representation (LLVM IR) instead of going the full way to assembly/binary form.

The LLVM IR is a full specified language used internally by the compiler:

  • CLang lowers the C++ code to LLVM IR
  • LLVM optimizes the IR
  • A LLVM Backend (for example x86) produces the assembly from the IR

The IR is the last step before machine-specific code, so you don't have to learn specific assembly directives and you still get a very low-level representation of what's really going on under the hood.

You can get the IR both before and after optimizations, the latter being more representative of real code, but further away from what you origially wrote.

Example with a C program:

#include <stdio.h>
#include <stdlib.h>

static int factorial(int X) {
  if (X == 0) return 1;
  return X*factorial(X-1);
}

int main(int argc, char **argv) {
  printf("%d\n", factorial(atoi(argv[1])));
}

Corresponding IR:

; ModuleID = '/tmp/webcompile/_10956_0.bc'
target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64"
target triple = "x86_64-unknown-linux-gnu"

@.str = private unnamed_addr constant [4 x i8] c"%d\0A\00"

define i32 @main(i32 %argc, i8** nocapture %argv) nounwind {
; <label>:0
  %1 = getelementptr inbounds i8** %argv, i64 1
  %2 = load i8** %1, align 8, !tbaa !0
  %3 = tail call i64 @strtol(i8* nocapture %2, i8** null, i32 10) nounwind
  %4 = trunc i64 %3 to i32
  %5 = icmp eq i32 %4, 0
  br i1 %5, label %factorial.exit, label %tailrecurse.i

tailrecurse.i:                                    ; preds = %tailrecurse.i, %0
  %indvar.i = phi i32 [ %indvar.next.i, %tailrecurse.i ], [ 0, %0 ]
  %accumulator.tr1.i = phi i32 [ %6, %tailrecurse.i ], [ 1, %0 ]
  %X.tr2.i = sub i32 %4, %indvar.i
  %6 = mul nsw i32 %X.tr2.i, %accumulator.tr1.i
  %indvar.next.i = add i32 %indvar.i, 1
  %exitcond = icmp eq i32 %indvar.next.i, %4
  br i1 %exitcond, label %factorial.exit, label %tailrecurse.i

factorial.exit:                                   ; preds = %tailrecurse.i, %0
  %accumulator.tr.lcssa.i = phi i32 [ 1, %0 ], [ %6, %tailrecurse.i ]
  %7 = tail call i32 (i8*, ...)* @printf(i8* getelementptr inbounds ([4 x i8]* @.str, i64 0, i64 0), i32 %accumulator.tr.lcssa.i) nounwind
  ret i32 0
}

declare i32 @printf(i8* nocapture, ...) nounwind

declare i64 @strtol(i8*, i8** nocapture, i32) nounwind

!0 = metadata !{metadata !"any pointer", metadata !1}
!1 = metadata !{metadata !"omnipotent char", metadata !2}
!2 = metadata !{metadata !"Simple C/C++ TBAA", null}

I personally find it relatively readable (it tries to preserve the variable names, somewhat, the function names are still there) once you get past the original discovery of the language.

like image 161
Matthieu M. Avatar answered Oct 09 '22 21:10

Matthieu M.


The first C++ compiler was cfront, which was, as the name implies, a front-end for C; in theory, cfront's output is what you'd like to see. But cfront hasn't been available for many years; it was a commercial product, and the source is not available.

Modern C++ compilers don't use a C intermediary; if there's an intermediary at all, it's an internal compiler representation, not something you'd enjoy looking at! The -S option to g++ will spit out *.s files: assembly code, which includes just enough symbols that you could, in theory, follow it.

like image 21
Ernest Friedman-Hill Avatar answered Oct 09 '22 20:10

Ernest Friedman-Hill