Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Name mangling confusion in LLVM

I have been trying to build and execute LLVM modules. My code for generating the modules is quite long, so I won't post it here. Instead my question is about how Clang and LLVM work together to achieve name mangling. I will explain my specific issue to motivate the question.

Here is the source-code of one of my LLVM modules:

#include <iostream>

int main() {
  std::cout << "Hello, world. " << std::endl;
  return 0;
}

Here is the generated LLVM IR; it is too big for StackOverflow.

When I try to execute my module using lli, I get the following error:

LLVM ERROR: Program used external function '__ZNSt3__112basic_stringIcNS_11char_traitsIcEENS_9allocatorIcEEEC1Emc' which could not be resolved!

Running the symbol through a demangler, the missing symbol is:

_std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >::basic_string(unsigned long, char)

The extra _ is suspicious, and the function without the leading underscore seems to exist in the IR!

; Function Attrs: alwaysinline ssp uwtable
define available_externally hidden void @_ZNSt3__112basic_stringIcNS_11char_traitsIcEENS_9allocatorIcEEEC1Emc(%"class.std::__1::basic_string"*, i64, i8 signext) unnamed_addr #2 align 2 {
  %4 = alloca %"class.std::__1::basic_string"*, align 8
  %5 = alloca i64, align 8
  %6 = alloca i8, align 1
  store %"class.std::__1::basic_string"* %0, %"class.std::__1::basic_string"** %4, align 8
  store i64 %1, i64* %5, align 8
  store i8 %2, i8* %6, align 1
  %7 = load %"class.std::__1::basic_string"*, %"class.std::__1::basic_string"** %4, align 8
  %8 = load i64, i64* %5, align 8
  %9 = load i8, i8* %6, align 1
  call void @_ZNSt3__112basic_stringIcNS_11char_traitsIcEENS_9allocatorIcEEEC2Emc(%"class.std::__1::basic_string"* %7, i64 %8, i8 signext %9)
  ret void
}

I am on macOS, so a leading underscore is to be expected, but I think that the Clang might be adding it twice.

I looked through the LLVM / Clang source, and it seems that there are two mangling steps:

  1. Taking possibly overloaded C++ functions and mangling them to unique names for the LLVM IR
  2. Taking a mangled name from the LLVM IR and adding any platform-specific quirks, such as leading underscores

However, this is just my theory. Can someone could explain how the mangling process works in Clang and LLVM? How should I create my llvm::DataLayout objects to get the correct mangling for my platform?


nm -gU /usr/lib/libc++.dylib` and `nm -gU /usr/lib/libc++abi.dylib` do not contain `__ZNSt3__112basic_stringIcNS_11char_traitsIcEENS_9allocatorI‌​cEEEC1Emc

When I try to compile the IR, I get this error:

llc generated.ll
clang++ generated.s

Undefined symbols for architecture x86_64:
"std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >::data() const", referenced from:
  std::__1::ostreambuf_iterator<char, std::__1::char_traits<char> > std::__1::__pad_and_output<char, std::__1::char_traits<char> >(std::__1::ostreambuf_iterator<char, std::__1::char_traits<char> >, char const*, char const*, char const*, std::__1::ios_base&, char) in generated-b4252a.o
"std::__1::basic_ostream<char, std::__1::char_traits<char> >::sentry::operator bool() const", referenced from:
  std::__1::basic_ostream<char, std::__1::char_traits<char> >& std::__1::__put_character_sequence<char, std::__1::char_traits<char> >(std::__1::basic_ostream<char, std::__1::char_traits<char> >&, char const*, unsigned long) in generated-b4252a.o
"std::__1::basic_ios<char, std::__1::char_traits<char> >::fill() const", referenced from:
  std::__1::basic_ostream<char, std::__1::char_traits<char> >& std::__1::__put_character_sequence<char, std::__1::char_traits<char> >(std::__1::basic_ostream<char, std::__1::char_traits<char> >&, char const*, unsigned long) in generated-b4252a.o
"std::__1::basic_ios<char, std::__1::char_traits<char> >::rdbuf() const", referenced from:
  std::__1::ostreambuf_iterator<char, std::__1::char_traits<char> >::ostreambuf_iterator(std::__1::basic_ostream<char, std::__1::char_traits<char> >&) in generated-b4252a.o
"std::__1::basic_ios<char, std::__1::char_traits<char> >::widen(char) const", referenced from:
  std::__1::basic_ostream<char, std::__1::char_traits<char> >& std::__1::endl<char, std::__1::char_traits<char> >(std::__1::basic_ostream<char, std::__1::char_traits<char> >&) in generated-b4252a.o
"std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >::basic_string(unsigned long, char)", referenced from:
  std::__1::ostreambuf_iterator<char, std::__1::char_traits<char> > std::__1::__pad_and_output<char, std::__1::char_traits<char> >(std::__1::ostreambuf_iterator<char, std::__1::char_traits<char> >, char const*, char const*, char const*, std::__1::ios_base&, char) in generated-b4252a.o
"std::__1::basic_ios<char, std::__1::char_traits<char> >::setstate(unsigned int)", referenced from:
  std::__1::basic_ostream<char, std::__1::char_traits<char> >& std::__1::__put_character_sequence<char, std::__1::char_traits<char> >(std::__1::basic_ostream<char, std::__1::char_traits<char> >&, char const*, unsigned long) in generated-b4252a.o
ld: symbol(s) not found for architecture x86_64
clang-3.9: error: linker command failed with exit code 1 (use -v to see invocation)
like image 668
sdgfsdh Avatar asked Feb 03 '17 17:02

sdgfsdh


People also ask

How do you stop the name mangling?

To prevent the C++ compiler from mangling the name of a function, you can apply the extern "C" linkage specifier to the declaration or declarations, as shown in the following example: extern "C" { int f1(int); int f2(int); int f3(int); };

What is the point of name mangling?

Name mangling is the encoding of function and variable names into unique names so that linkers can separate common names in the language. Type names may also be mangled. Name mangling is commonly used to facilitate the overloading feature and visibility within different scopes.

What is Llvm in clang?

clang.llvm.org. Clang operates in tandem with the LLVM compiler back end and has been a subproject of LLVM 2.6 and later. As with LLVM, it is free and open-source software under the Apache License 2.0 software license. Its contributors include Apple, Microsoft, Google, ARM, Sony, Intel, and AMD.


1 Answers

I wouldn't suspect a name mangling issue. C++ name mangling happens at the front-end (i.e. clang) and it's part of a pretty well-defined/-documented ABI standard.

Moreover, I don't think there is a spurious underscore, cause that does not produce a valid C++ name back and the mangled name in the pastebin link that you provided appears as:

_ZNSt3__112basic_stringIcNS_11char_traitsIcEENS_9allocatorIcEEEC1Emc

I'm not on Mac OS, but simulating with my LLVM 3.8.1 on Linux (using --stdlib=libc++), using the same source and matching the IR line by line, I get the following symbol:

_ZNSt3__112basic_stringIcNS_11char_traitsIcEENS_9allocatorIcEEE6__initEmc

which demangles back to:

std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >::__init(unsigned long, char)

which I guess does pretty much the same construction of some sort.

So, I believe that your linker picks up the wrong libc++ version.

You could check the symbols available in the libc++ that is tied to the clang/LLVM that you are using, found in the directory given by llvm-config --libdir or even checking the rpath entry of your toolchain binaries with readelf -d $(which lli).

If there are multiple LLVM installations (e.g. a system one and one that you compiled from sources yourself), you might have to play around with the -L option of clang which directs ld to add that path in its search list. A quick alternative (that I wouldn't recommend for regular use) is to do this on the command line:

LD_LIBRARY_PATH=$(llvm-config --libdir) clang generated.s

like image 174
compor Avatar answered Oct 03 '22 00:10

compor