I have been trying to build and execute LLVM modules. My code for generating the modules is quite long, so I won't post it here. Instead my question is about how Clang and LLVM work together to achieve name mangling. I will explain my specific issue to motivate the question.
Here is the source-code of one of my LLVM modules:
#include <iostream>
int main() {
std::cout << "Hello, world. " << std::endl;
return 0;
}
Here is the generated LLVM IR; it is too big for StackOverflow.
When I try to execute my module using lli
, I get the following error:
LLVM ERROR: Program used external function '__ZNSt3__112basic_stringIcNS_11char_traitsIcEENS_9allocatorIcEEEC1Emc' which could not be resolved!
Running the symbol through a demangler, the missing symbol is:
_std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >::basic_string(unsigned long, char)
The extra _
is suspicious, and the function without the leading underscore seems to exist in the IR!
; Function Attrs: alwaysinline ssp uwtable
define available_externally hidden void @_ZNSt3__112basic_stringIcNS_11char_traitsIcEENS_9allocatorIcEEEC1Emc(%"class.std::__1::basic_string"*, i64, i8 signext) unnamed_addr #2 align 2 {
%4 = alloca %"class.std::__1::basic_string"*, align 8
%5 = alloca i64, align 8
%6 = alloca i8, align 1
store %"class.std::__1::basic_string"* %0, %"class.std::__1::basic_string"** %4, align 8
store i64 %1, i64* %5, align 8
store i8 %2, i8* %6, align 1
%7 = load %"class.std::__1::basic_string"*, %"class.std::__1::basic_string"** %4, align 8
%8 = load i64, i64* %5, align 8
%9 = load i8, i8* %6, align 1
call void @_ZNSt3__112basic_stringIcNS_11char_traitsIcEENS_9allocatorIcEEEC2Emc(%"class.std::__1::basic_string"* %7, i64 %8, i8 signext %9)
ret void
}
I am on macOS, so a leading underscore is to be expected, but I think that the Clang might be adding it twice.
I looked through the LLVM / Clang source, and it seems that there are two mangling steps:
However, this is just my theory. Can someone could explain how the mangling process works in Clang and LLVM? How should I create my llvm::DataLayout
objects to get the correct mangling for my platform?
nm -gU /usr/lib/libc++.dylib` and `nm -gU /usr/lib/libc++abi.dylib` do not contain `__ZNSt3__112basic_stringIcNS_11char_traitsIcEENS_9allocatorIcEEEC1Emc
When I try to compile the IR, I get this error:
llc generated.ll
clang++ generated.s
Undefined symbols for architecture x86_64:
"std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >::data() const", referenced from:
std::__1::ostreambuf_iterator<char, std::__1::char_traits<char> > std::__1::__pad_and_output<char, std::__1::char_traits<char> >(std::__1::ostreambuf_iterator<char, std::__1::char_traits<char> >, char const*, char const*, char const*, std::__1::ios_base&, char) in generated-b4252a.o
"std::__1::basic_ostream<char, std::__1::char_traits<char> >::sentry::operator bool() const", referenced from:
std::__1::basic_ostream<char, std::__1::char_traits<char> >& std::__1::__put_character_sequence<char, std::__1::char_traits<char> >(std::__1::basic_ostream<char, std::__1::char_traits<char> >&, char const*, unsigned long) in generated-b4252a.o
"std::__1::basic_ios<char, std::__1::char_traits<char> >::fill() const", referenced from:
std::__1::basic_ostream<char, std::__1::char_traits<char> >& std::__1::__put_character_sequence<char, std::__1::char_traits<char> >(std::__1::basic_ostream<char, std::__1::char_traits<char> >&, char const*, unsigned long) in generated-b4252a.o
"std::__1::basic_ios<char, std::__1::char_traits<char> >::rdbuf() const", referenced from:
std::__1::ostreambuf_iterator<char, std::__1::char_traits<char> >::ostreambuf_iterator(std::__1::basic_ostream<char, std::__1::char_traits<char> >&) in generated-b4252a.o
"std::__1::basic_ios<char, std::__1::char_traits<char> >::widen(char) const", referenced from:
std::__1::basic_ostream<char, std::__1::char_traits<char> >& std::__1::endl<char, std::__1::char_traits<char> >(std::__1::basic_ostream<char, std::__1::char_traits<char> >&) in generated-b4252a.o
"std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >::basic_string(unsigned long, char)", referenced from:
std::__1::ostreambuf_iterator<char, std::__1::char_traits<char> > std::__1::__pad_and_output<char, std::__1::char_traits<char> >(std::__1::ostreambuf_iterator<char, std::__1::char_traits<char> >, char const*, char const*, char const*, std::__1::ios_base&, char) in generated-b4252a.o
"std::__1::basic_ios<char, std::__1::char_traits<char> >::setstate(unsigned int)", referenced from:
std::__1::basic_ostream<char, std::__1::char_traits<char> >& std::__1::__put_character_sequence<char, std::__1::char_traits<char> >(std::__1::basic_ostream<char, std::__1::char_traits<char> >&, char const*, unsigned long) in generated-b4252a.o
ld: symbol(s) not found for architecture x86_64
clang-3.9: error: linker command failed with exit code 1 (use -v to see invocation)
To prevent the C++ compiler from mangling the name of a function, you can apply the extern "C" linkage specifier to the declaration or declarations, as shown in the following example: extern "C" { int f1(int); int f2(int); int f3(int); };
Name mangling is the encoding of function and variable names into unique names so that linkers can separate common names in the language. Type names may also be mangled. Name mangling is commonly used to facilitate the overloading feature and visibility within different scopes.
clang.llvm.org. Clang operates in tandem with the LLVM compiler back end and has been a subproject of LLVM 2.6 and later. As with LLVM, it is free and open-source software under the Apache License 2.0 software license. Its contributors include Apple, Microsoft, Google, ARM, Sony, Intel, and AMD.
I wouldn't suspect a name mangling issue. C++ name mangling happens at the front-end (i.e. clang
) and it's part of a pretty well-defined/-documented ABI standard.
Moreover, I don't think there is a spurious underscore, cause that does not produce a valid C++
name back and the mangled name in the pastebin link that you provided appears as:
_ZNSt3__112basic_stringIcNS_11char_traitsIcEENS_9allocatorIcEEEC1Emc
I'm not on Mac OS, but simulating with my LLVM 3.8.1 on Linux (using --stdlib=libc++
), using the same source and matching the IR line by line,
I get the following symbol:
_ZNSt3__112basic_stringIcNS_11char_traitsIcEENS_9allocatorIcEEE6__initEmc
which demangles back to:
std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >::__init(unsigned long, char)
which I guess does pretty much the same construction of some sort.
So, I believe that your linker picks up the wrong libc++
version.
You could check the symbols available in the libc++
that is tied to the clang/LLVM that you are using, found in the directory given by llvm-config --libdir
or even checking the rpath entry of your toolchain binaries with readelf -d $(which lli)
.
If there are multiple LLVM installations (e.g. a system one and one that you compiled from sources yourself), you might have to play around with the -L
option of clang
which directs ld
to add that path in its search list.
A quick alternative (that I wouldn't recommend for regular use) is to do this on the command line:
LD_LIBRARY_PATH=$(llvm-config --libdir) clang generated.s
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With