If I have a C++ program that declares a struct
, say:
struct S {
short s;
union U {
bool b;
void *v;
};
U u;
};
and I generate some LLVM IR via the LLVM C++ API to mirror the C++ declaration:
vector<Type*> members;
members.push_back( IntegerType::get( ctx, sizeof( short ) * 8 ) );
// since LLVM doesn't support unions, just use an ArrayType that's the same size
members.push_back( ArrayType::get( IntegerType::get( ctx, 8 ), sizeof( S::U ) ) );
StructType *const llvm_S = StructType::create( ctx, "S" );
llvm_S->setBody( members );
How can I ensure that sizeof(S)
in the C++ code is the same size as the StructType
in LLVM IR code? Same for the offsets of the individual members, i.e., u.b
.
It's also the case that I have an array of S
allocated in C++:
S *s_array = new S[10];
and I pass s_array
to LLVM IR code in which I access individual elements of the array. In order for this to work, sizeof(S)
has to be the same in both C++ and LLVM IR so this:
%elt = getelementptr %S* %ptr_to_start, i64 1
will access s_array[1]
properly.
When I compile and run the program below, it outputs:
sizeof(S) = 16
allocSize(S) = 10
The problem is that LLVM is missing 6 bytes of padding between S::s
and S::u
. The C++ compiler makes the union
start on a 8-byte-aligned boundary whereas LLVM does not.
I was playing around with DataLayout
. For my machine [Mac OS X 10.9.5, g++ Apple LLVM version 6.0 (clang-600.0.57) (based on LLVM 3.5svn)], if I print the data layout string, I get:
e-m:o-i64:64-f80:128-n8:16:32:64-S128
If I force-set the data layout to:
e-m:o-i64:64-f80:128-n8:16:32:64-S128-a:64
where the addition is of a:64
which means that an object of aggregate type aligns on a 64-bit boundary, then I get the same size. So why isn't the default data layout correct?
// LLVM
#include <llvm/ExecutionEngine/ExecutionEngine.h>
#include <llvm/ExecutionEngine/MCJIT.h>
#include <llvm/IR/DerivedTypes.h>
#include <llvm/IR/LLVMContext.h>
#include <llvm/IR/Module.h>
#include <llvm/IR/Type.h>
#include <llvm/Support/TargetSelect.h>
// standard
#include <iostream>
#include <memory>
#include <string>
using namespace std;
using namespace llvm;
struct S {
short s;
union U {
bool b;
void *v;
};
U u;
};
ExecutionEngine* createEngine( Module *module ) {
InitializeNativeTarget();
InitializeNativeTargetAsmPrinter();
unique_ptr<Module> u( module );
EngineBuilder eb( move( u ) );
string errStr;
eb.setErrorStr( &errStr );
eb.setEngineKind( EngineKind::JIT );
ExecutionEngine *const exec = eb.create();
if ( !exec ) {
cerr << "Could not create ExecutionEngine: " << errStr << endl;
exit( 1 );
}
return exec;
}
int main() {
LLVMContext ctx;
vector<Type*> members;
members.push_back( IntegerType::get( ctx, sizeof( short ) * 8 ) );
members.push_back( ArrayType::get( IntegerType::get( ctx, 8 ), sizeof( S::U ) ) );
StructType *const llvm_S = StructType::create( ctx, "S" );
llvm_S->setBody( members );
Module *const module = new Module( "size_test", ctx );
ExecutionEngine *const exec = createEngine( module );
DataLayout const *const layout = exec->getDataLayout();
module->setDataLayout( layout );
cout << "sizeof(S) = " << sizeof( S ) << endl;
cout << "allocSize(S) = " << layout->getTypeAllocSize( llvm_S ) << endl;
delete exec;
return 0;
}
The main purpose of DataLayout
is to know the alignment of elements. If you don't need to know the size, alignment or offsets of elements in your code [and LLVM doesn't really have a useful way beyond GEP instruction to find the offset, so you can pretty much ignore the offset part], you won't need a datalayout until you come to execute (or generate object file) from the IR.
(I did have some very interesting bugs from trying to compile 32-bit code with a 64-bit "native" datalayout when I implemented the -m32 switch for my compiler - not a good idea to switch DataLayout in the middle of compilation, which I did because I used the "default" one, and then set a different one when it came to creating the actual object file).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With