Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Accessing struct members and arrays of structs from LLVM IR

Tags:

c++

jit

llvm

If I have a C++ program that declares a struct, say:

struct S {
    short s;
    union U {
        bool b;
        void *v;
    };
    U u;
};

and I generate some LLVM IR via the LLVM C++ API to mirror the C++ declaration:

vector<Type*> members;
members.push_back( IntegerType::get( ctx, sizeof( short ) * 8 ) );
// since LLVM doesn't support unions, just use an ArrayType that's the same size
members.push_back( ArrayType::get( IntegerType::get( ctx, 8 ), sizeof( S::U ) ) );

StructType *const llvm_S = StructType::create( ctx, "S" );
llvm_S->setBody( members );

How can I ensure that sizeof(S) in the C++ code is the same size as the StructType in LLVM IR code? Same for the offsets of the individual members, i.e., u.b.

It's also the case that I have an array of S allocated in C++:

S *s_array = new S[10];

and I pass s_array to LLVM IR code in which I access individual elements of the array. In order for this to work, sizeof(S) has to be the same in both C++ and LLVM IR so this:

%elt = getelementptr %S* %ptr_to_start, i64 1

will access s_array[1] properly.

When I compile and run the program below, it outputs:

sizeof(S) = 16
allocSize(S) = 10

The problem is that LLVM is missing 6 bytes of padding between S::s and S::u. The C++ compiler makes the union start on a 8-byte-aligned boundary whereas LLVM does not.

I was playing around with DataLayout. For my machine [Mac OS X 10.9.5, g++ Apple LLVM version 6.0 (clang-600.0.57) (based on LLVM 3.5svn)], if I print the data layout string, I get:

e-m:o-i64:64-f80:128-n8:16:32:64-S128

If I force-set the data layout to:

e-m:o-i64:64-f80:128-n8:16:32:64-S128-a:64

where the addition is of a:64 which means that an object of aggregate type aligns on a 64-bit boundary, then I get the same size. So why isn't the default data layout correct?


Complete working program below

// LLVM
#include <llvm/ExecutionEngine/ExecutionEngine.h>
#include <llvm/ExecutionEngine/MCJIT.h>
#include <llvm/IR/DerivedTypes.h>
#include <llvm/IR/LLVMContext.h>
#include <llvm/IR/Module.h>
#include <llvm/IR/Type.h>
#include <llvm/Support/TargetSelect.h>

// standard
#include <iostream>
#include <memory>
#include <string>

using namespace std;
using namespace llvm;

struct S {
    short s;
    union U {
        bool b;
        void *v;
    };
    U u;
};

ExecutionEngine* createEngine( Module *module ) {
    InitializeNativeTarget();
    InitializeNativeTargetAsmPrinter();

    unique_ptr<Module> u( module );
    EngineBuilder eb( move( u ) );
    string errStr;
    eb.setErrorStr( &errStr );
    eb.setEngineKind( EngineKind::JIT );
    ExecutionEngine *const exec = eb.create();
    if ( !exec ) {
        cerr << "Could not create ExecutionEngine: " << errStr << endl;
        exit( 1 );
    }
    return exec;
}

int main() {
    LLVMContext ctx;

    vector<Type*> members;
    members.push_back( IntegerType::get( ctx, sizeof( short ) * 8 ) );
    members.push_back( ArrayType::get( IntegerType::get( ctx, 8 ), sizeof( S::U ) ) );

    StructType *const llvm_S = StructType::create( ctx, "S" );
    llvm_S->setBody( members );

    Module *const module = new Module( "size_test", ctx );
    ExecutionEngine *const exec = createEngine( module );
    DataLayout const *const layout = exec->getDataLayout();
    module->setDataLayout( layout );

    cout << "sizeof(S) = " << sizeof( S ) << endl;
    cout << "allocSize(S) = " << layout->getTypeAllocSize( llvm_S ) << endl;

    delete exec;
    return 0;
}
like image 652
Paul J. Lucas Avatar asked Aug 30 '15 17:08

Paul J. Lucas


1 Answers

The main purpose of DataLayout is to know the alignment of elements. If you don't need to know the size, alignment or offsets of elements in your code [and LLVM doesn't really have a useful way beyond GEP instruction to find the offset, so you can pretty much ignore the offset part], you won't need a datalayout until you come to execute (or generate object file) from the IR.

(I did have some very interesting bugs from trying to compile 32-bit code with a 64-bit "native" datalayout when I implemented the -m32 switch for my compiler - not a good idea to switch DataLayout in the middle of compilation, which I did because I used the "default" one, and then set a different one when it came to creating the actual object file).

like image 102
Mats Petersson Avatar answered Oct 25 '22 08:10

Mats Petersson