Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

llvm extract struct elements and struct size in C++

Tags:

LLVM Newbie here. I have the following C++ program

using namespace std;
struct A{
  int i;
  int j;
};

int main()
{
   struct A obj;
   obj.i = 10;
   obj.j = obj.i;
   return 0;
}

Using clang++, I can see that LLVM IR contains struct field as below

%struct.A = type { i32, i32 }

I would like to obtain the structure elements using LLVM Pass. I write the following program - that iterates through both global variables, and each of the Instruction operands, but none of them help me in extracting struct A or A.i or A.j.

    #include "llvm/Pass.h"
    #include "llvm/IR/Function.h"
    #include "llvm/Support/raw_ostream.h"

    #include <llvm/IR/Constants.h>
    #include <llvm/IR/DerivedTypes.h>
    #include <llvm/IR/Instructions.h>
    #include <llvm/IR/IntrinsicInst.h>
    #include <llvm/IR/LLVMContext.h>
    #include <llvm/IR/Module.h>

    #include <iostream>
    #include <map>
    #include <vector>


    using namespace llvm;

    namespace {

    class StructModulePass: public ModulePass {
    public:
    static char ID;
    StructModulePass() : ModulePass(ID) {}
    virtual bool runOnModule(Module &M1) override {
    // iterate over global structures
    M = &M1;
    int i;
    for(auto G = M->global_begin(); G!= M->global_end() ; G++, i++){
    errs() << i << " == > " ;
    errs().write_escaped(G->getName()) << "\n";
  }

// iterate through each instruction. module->function->BB->Inst
  for(auto &F_ : M->functions()){
    F = &F_;
    for(auto &B_ : *F)
      B = &B_;
      for(auto &I : *B) {
        for (unsigned i = 0; i < I.getNumOperands(); i++)
          std::cerr << I.getOperand(i)->getName().data() << std::endl;
      }
  }
  return true;
  }
private:
  Module *M;
  Function *F;
  BasicBlock *B;
};
  }


char StructModulePass:: ID = 0;
static RegisterPass<StructModulePass> X("getstructnamesize", "Get All Struct Names and Sizes",
                             false /* Only looks at CFG */ ,
                             false /* Analysis Pass */);

I want to create a database of all structures (global and local) defined and being used in my program. Eg. < A , <int32, int32> , B , <int32, bool , char *>>.

I have gone through doxygen pages, LLVM tutorials and checked if we can get the struct values, but I am unable to find a way to extract the structures without already knowing the struct values - eg. creating an IRBuilder, inserting predefined IntTy32 type variables. Any help in this regard or some relevant tutorials will help

like image 617
Shehbaz Jaffer Avatar asked Mar 28 '16 18:03

Shehbaz Jaffer


1 Answers

In LLVM IR terminology, a "global" is a global variable or global constant. This line:

%struct.A = type { i32, i32 }

Is an identified structure specification, not a global variable, just like how typedef in C++ is not a global variable. You can iterate over those using Module::getIdentifiedStructTypes().

Some notes, however:

  1. Get familiar with the dump() method. It's a far easier alternative to all your prints to cerr.

  2. You're using getName() on values, not on types - I don't think that's what you meant to do. Also keep in mind LLVM values do not necessarily have names.

  3. Getting out results like <int32, bool, char *> - which are C++ types, not LLVM IR types - will be trickly. For instance, Clang will probably compile both bool and char to i8, and it won't be easy to tell the difference. You might also get vptr field, padding fields, etc. If you really do want the actual C++ structure of structs used in the source program, you have to rely on debug info.

like image 166
Oak Avatar answered Oct 12 '22 11:10

Oak