I'm writing a compiler for a language that requires lots of runtime functions. I'm using LLVM as my backend, so the codegen needs types for all those runtime types (functions, structs, etc) and instead of defining all of them manually using the LLVM APIs or handwriting the LLVM IR I'd like to write the headers in C and compile to the bitcode that the compiler can pull in with LLVMParseBitcodeInContext2
.
The issue I'm having is that clang doesn't seem to keep any of the type declarations that aren't used by any any function definitions. Clang has -femit-all-decls
which sounds like it's supposed to solve it, but it unfortunately isn't and Googling suggests it's misnamed as it only affects unused definitions, not declarations.
I then thought perhaps if I compile the headers only into .gch
files I could pull them in with LLVMParseBitcodeInContext2
the same way (since the docs say they use "the same" bitcode format", however doing so errors with error: Invalid bitcode signature
so something must be different. Perhaps the difference is small enough to workaround?
Any suggestions or relatively easy workarounds that can be automated for a complex runtime? I'd also be interested if someone has a totally alternative suggestion on approaching this general use case, keeping in mind I don't want to statically link in the runtime function bodies for every single object file I generate, just the types. I imagine this is something other compilers have needed as well so I wouldn't be surprised if I'm approaching this wrong.
e.g. given this input:
struct Foo {
int a;
int b;
};
struct Foo * something_with_foo(struct Foo *foo);
I need a bitcode file with this equivalent IR
; ...etc...
%struct.Foo = type { i32, i32 }
declare %struct.Foo* @something_with_foo(%struct.Foo*)
; ...etc...
I could write it all by hand, but this would be duplicative as I also need to create C headers for other interop and it'd be ideal not to have to keep them in sync manually. The runtime is rather large. I guess I could also do things the other way around: write the declarations in LLVM IR and generate the C headers.
Someone else asked about this years back, but the proposed solutions are rather hacky and fairly impractical for a runtime of this size and type complexity: Clang - Compiling a C header to LLVM IR/bitcode
So, clang
doesn't actually filter out the unused declarations. It defers emitting forward declarations till their first use. Whenever a function is used it checks if it has been emitted already, if not it emits the function declaration.
You can look at these lines in the clang repo.
// Forward declarations are emitted lazily on first use.
if (!FD->doesThisDeclarationHaveABody()) {
if (!FD->doesDeclarationForceExternallyVisibleDefinition())
return;
The simple fix here would be to either comment the last two lines or just add && false
to the second condition.
// Forward declarations are emitted lazily on first use.
if (!FD->doesThisDeclarationHaveABody()) {
if (!FD->doesDeclarationForceExternallyVisibleDefinition() && false)
return;
This will cause clang
to emit a declaration as soon as it sees it, this might also change the order in which definitions appear in your .ll
(or .bc
) files. Assuming that is not an issue.
To make it cleaner you can also add a command line flag --emit-all-declarations
and check that here before you continue.
Clang's precompiled headers implementation does not seem to output LLVM IR, but only the AST (Abstract Syntax Tree) so that the header does not need to be parsed again:
The AST file itself contains a serialized representation of Clang’s abstract syntax trees and supporting data structures, stored using the same compressed bitstream as LLVM’s bitcode file format.
The underlying binary format may be the same, but it sounds like the content is different and LLVM's bitcode format is merely a container in this case. This is not very clear from the help page on the website, so I am just speculating. A LLVM/Clang expert could help clarify this point.
Unfortunately, there does not seem to be an elegant way around this. What I suggest in order to minimize the effort required to achieve what you want is to build a minimal C/C++ source file that in some way uses all the declarations that you want to be compiled to LLVM IR. For example, you just need to declare a pointer to a struct to ensure it does not get optimized away, and you may just provide an empty definition for a function to keep its signature.
Once you have a minimal source file, compile it with clang -O0 -c -emit-llvm -o precompiled.ll
to get a module with all definitions in LLVM IR format.
An example from the snippet you posted:
struct Foo {
int a;
int b;
};
// Fake function definition.
struct Foo * something_with_foo(struct Foo *foo)
{
return NULL;
}
// A global variable.
struct Foo* x;
Output that shows that definitions are kept: https://godbolt.org/g/2F89BH
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With