Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Emmiting llvm bytecode from clang: 'byval' attribute for passing objects with nontrivial destructor into a function

Tags:

llvm

clang

I have a source C++ code which I parse using clang, producing llvm bytecode. From this point I want to process the file myself... However I encoudered a problem. Consider the following scenario: - I create a class with a nontrivial destructor or copy constructor. - I define a function, where an object of this class is passed as a parameter, by value (no reference or pointer).

In the produced bytecode, I get a pointer instead. For classes without the destructor, the parameter is annotated as 'byval', but it is not so in this case. As a result, I cannot distinguish if the parameter is passed by value, or really by a pointer.

Consider the following example:

Input file - cpass.cpp:

class C {
  public:
  int x;
  ~C() {}
};

void set(C val, int x) {val.x=x;};

void set(C *ptr, int x) {ptr->x=x;}

Compilation command line:

clang++ -c cpass.cpp -emit-llvm -o cpass.bc; llvm-dis cpass.bc

Produced output file (cpass.ll):

; ModuleID = 'cpass.bc'
target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64"
target triple = "x86_64-unknown-linux-gnu"

%class.C = type { i32 }

define void @_Z3set1Ci(%class.C* %val, i32 %x) nounwind {
  %1 = alloca i32, align 4
  store i32 %x, i32* %1, align 4
  %2 = load i32* %1, align 4
  %3 = getelementptr inbounds %class.C* %val, i32 0, i32 0
  store i32 %2, i32* %3, align 4
  ret void
}

define void @_Z3setP1Ci(%class.C* %ptr, i32 %x) nounwind {
  %1 = alloca %class.C*, align 8
  %2 = alloca i32, align 4
  store %class.C* %ptr, %class.C** %1, align 8
  store i32 %x, i32* %2, align 4
  %3 = load i32* %2, align 4
  %4 = load %class.C** %1, align 8
  %5 = getelementptr inbounds %class.C* %4, i32 0, i32 0
  store i32 %3, i32* %5, align 4
  ret void
}

As you can see, the parameters of both set functions look exactly the same. So how can I tell that the first function was meant to take the parameter by value, instead of a pointer?

One solution could be to somehow parse the mangled function name, but it may not be always viable. What if somebody puts extern "C" before the function?

Is there a way to tell clang to keep the byval annotation, or to produce an extra annotation for each function parameter passed by a value?

Anton Korobeynikov suggests that I should dig into clang's LLVM IR emission. Unfortunately I know almost nothing about clang internals, the documentation is rather sparse. The Internals Manual of clang does not talk about IR emission. So I don't really know how to start, where to go to get the problem solved, hopefully without actually going through all of clang source code. Any pointers? Hints? Further reading?


In response to Anton Korobeynikov:

I know more-or-less how C++ ABI looks like with respect of parameter passing. Found some good reading here: http://agner.org./optimize/calling_conventions.pdf. But this is very platform dependent! This approach might not be feasable on different architectures or in some special circumstances.

In my case, for example, the function is going to be run on a different device than where it is being called from. The two devices don't share memory, so they don't even share the stack. Unless the user is passing a pointer (in which case we assume he knows what he is doing), an object should always be passed within the function-parameters message. If it has a nontrivial copy constructor, it should be executed by the caller, but the object should be created in the parameter area as well.

So, what I would like to do is to somehow override the ABI in clang, without too much intrusion into their source code. Or maybe add some additional annotation, which would be ignored in a normal compilation pipeline, but I could detect when parsing the .bc/.ll file. Or somehow differently reconstruct the function signature.


like image 691
CygnusX1 Avatar asked Oct 11 '22 15:10

CygnusX1


1 Answers

Unfortunately, "byval" is not just "annotation", it's parameter attribute which means a alot for optimizers and backends. Basically, the rules how to pass small structs / classes with and without non-trivial functions are government by platform C++ ABI, so you cannot just always use byval here.

In fact, byval here is just a result of minor optimization at frontend level. When you're passing stuff by value, then temporary object should be constructed on stack (via the default copy ctor). When you have a class which is something POD-like, then clang can deduce that copy ctor will be trivial and will optimize the pair of ctor / dtor out, passing just the "contents".

For non-trivial classes (like in your case) clang cannot perform such optimization and have to call both ctor and dtor. Thus you're seeing the pointer to temporary object is created.

Try to call your set() functions and you'll see what's going there.

like image 176
Anton Korobeynikov Avatar answered Dec 19 '22 05:12

Anton Korobeynikov