The operands for an llvm::User
(e.g. instruction) are llvm::Value
s.
After the mem2reg pass, variables are in SSA form, and their names as corresponding to the original source code are lost. Value::getName()
is only set for some things; for most variables, which are intermediaries, its not set.
The instnamer pass can be run to give all the variables names like tmp1 and tmp2, but this doesn't capture where they originally come from. Here's some LLVM IR beside the original C code:
I am building a simple html page to visualise and debug some optimisations I am working on, and I want to show the SSA variables as namever notation, rather than just temporary instnamer names. Its just to aid my readability.
I am getting my LLVM IR from clang with a commandline such as:
clang -g3 -O1 -emit-llvm -o test.bc -c test.c
There are calls to llvm.dbg.declare
and llvm.dbg.value
in the IR; how do you turn into the original sourcecode names and SSA version numbers?
So how can I determine the original variable (or named constant name) from an llvm::Value
? Debuggers must be able to do this, so how can I?
This is part of the debug information that's attached to LLVM IR in the form of metadata. Documentation is here. An old blog post with some background is also available.
$ cat > z.c
long fact(long arg, long farg, long bart)
{
long foo = farg + bart;
return foo * arg;
}
$ clang -emit-llvm -O3 -g -c z.c
$ llvm-dis z.bc -o -
Produces this:
define i64 @fact(i64 %arg, i64 %farg, i64 %bart) #0 {
entry:
tail call void @llvm.dbg.value(metadata !{i64 %arg}, i64 0, metadata !10), !dbg !17
tail call void @llvm.dbg.value(metadata !{i64 %farg}, i64 0, metadata !11), !dbg !17
tail call void @llvm.dbg.value(metadata !{i64 %bart}, i64 0, metadata !12), !dbg !17
%add = add nsw i64 %bart, %farg, !dbg !18
tail call void @llvm.dbg.value(metadata !{i64 %add}, i64 0, metadata !13), !dbg !18
%mul = mul nsw i64 %add, %arg, !dbg !19
ret i64 %mul, !dbg !19
}
With -O0
instead of -O3
, you won't see llvm.dbg.value
, but you will see llvm.dbg.declare
.
Given a Value
, getting variable name from it can be done by traversing all the llvm.dbg.declare
and llvm.dbg.value
calls in the enclosing function, checking if any refers to that value, and if so, return the DIVariable
associated with the value by that intrinsic call.
So, the code should look something like (roughly, not tested or even compiled):
const Function* findEnclosingFunc(const Value* V) {
if (const Argument* Arg = dyn_cast<Argument>(V)) {
return Arg->getParent();
}
if (const Instruction* I = dyn_cast<Instruction>(V)) {
return I->getParent()->getParent();
}
return NULL;
}
const MDNode* findVar(const Value* V, const Function* F) {
for (const_inst_iterator Iter = inst_begin(F), End = inst_end(F); Iter != End; ++Iter) {
const Instruction* I = &*Iter;
if (const DbgDeclareInst* DbgDeclare = dyn_cast<DbgDeclareInst>(I)) {
if (DbgDeclare->getAddress() == V) return DbgDeclare->getVariable();
} else if (const DbgValueInst* DbgValue = dyn_cast<DbgValueInst>(I)) {
if (DbgValue->getValue() == V) return DbgValue->getVariable();
}
}
return NULL;
}
StringRef getOriginalName(const Value* V) {
// TODO handle globals as well
const Function* F = findEnclosingFunc(V);
if (!F) return V->getName();
const MDNode* Var = findVar(V, F);
if (!Var) return "tmp";
return DIVariable(Var).getName();
}
You can see above I was too lazy to add handling of globals, but it's not that big a deal actually - this requires iterating over all the globals listed under the current compile unit debug info (use M.getNamedMetadata("llvm.dbg.cu")
to get a list of all the compile units in the current module), then checking which matches your variable (via the getGlobal
method) and returning its name.
However, keep in mind the above will only work for values directly associated with original variables. Any value that is a result of any computation will not be properly named this way; and in particular, values that represent field accesses will not be named with the field name. This is doable but requires more involved processing - you'll have to identify the field number from the GEP, then dig into the type debug information for the struct to get back the field name. Debuggers do that, yes, but no debugger operates in LLVM IR land - as far as I know even LLVM's own LLDB works differently, by parsing the DWARF in the object file into Clang types.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With