Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is the return type of a function part of the mangled name?

Suppose I have two functions with the same parameter types and name (not in the same program):

std::string foo(int x) {
  return "hello"; 
}

int foo(int x) {
  return x;
}

Will they have the same mangled name once compiled?

Is the the return type part of the mangled name in C++?

like image 557
sdgfsdh Avatar asked Nov 24 '16 16:11

sdgfsdh


People also ask

What is mangled name in Java?

Name mangling is a term that denotes the process of mapping a name that is valid in a particular programming language to a name that is valid in the CORBA Interface Definition Language (IDL).

Does C mangle name?

In C, names may not be mangled as it doesn't support function overloading. So how to make sure that name of a symbol is not changed when we link a C code in C++. For example, see the following C++ program that uses printf() function of C.

Why does rust mangle name?

The linker has no idea about Rust. It only knows about C, where functions with identical names collide. Even C++ requires name mangling to work around this. In fact the Rust name mangling is derived from the C++ one.

Why does C++ name mangle?

Name mangling is commonly used to facilitate the overloading feature and visibility within different scopes. The compiler generates function names with an encoding of the types of the function arguments when the module is compiled.


2 Answers

As mangling schemes aren't standardised, there's no single answer to this question; the closest thing to an actual answer would be to look at mangled names generated by the most common mangling schemes. To my knowledge, those are the GCC and MSVC schemes, in alphabetical order, so...


GCC:

To test this, we can use a simple program.

#include <string>
#include <cstdlib>

std::string foo(int x) { return "hello"; }
//int         foo(int x) { return x; }

int main() {
    // Assuming executable file named "a.out".
    system("nm a.out");
}

Compile and run with GCC or Clang, and it'll list the symbols it contains. Depending on which of the functions is uncommented, the results will be:

// GCC:
// ----

std::string foo(int x) { return "hello"; } // _Z3fooB5cxx11i
                                             // foo[abi:cxx11](int)
int         foo(int x) { return x; }       // _Z3fooi
                                             // foo(int)

// Clang:
// ------

std::string foo(int x) { return "hello"; } // _Z3fooi
                                             // foo(int)
int         foo(int x) { return x; }       // _Z3fooi
                                             // foo(int)

The GCC scheme contains relatively little information, not including return types:

  • Symbol type: _Z for "function".
  • Name: 3foo for ::foo.
  • Parameters: i for int.

Despite this, however, they are different when compiled with GCC (but not with Clang), because GCC indicates that the std::string version uses the cxx11 ABI.

Note that it does still keep track of the return type, and make sure signatures match; it just doesn't use the function's mangled name to do so.


MSVC:

To test this, we can use a simple program, as above.

#include <string>
#include <cstdlib>
    
std::string foo(int x) { return "hello"; }
//int         foo(int x) { return x; }
    
int main() {
    // Assuming object file named "a.obj".
    // Pipe to file, because there are a lot of symbols when <string> is included.
    system("dumpbin/symbols a.obj > a.txt");
}

Compile and run with Visual Studio, and a.txt will list the symbols it contains. Depending on which of the functions is uncommented, the results will be:

std::string foo(int x) { return "hello"; }
  // ?foo@@YA?AV?$basic_string@DU?$char_traits@D@std@@V?$allocator@D@2@@std@@H@Z
  // class std::basic_string<char,struct std::char_traits<char>,class std::allocator<char> > __cdecl foo(int)
int         foo(int x) { return x; }
  // ?foo@@YAHH@Z
  // int __cdecl foo(int)

The MSVC scheme contains the entire declaration, including things that weren't explicitly specified:

  • Name: foo@ for ::foo, followed by @ to terminate.
  • Symbol type: Everything after the name-terminating @.
  • Type and member status: Y for "non-member function".
  • Calling convention: A for __cdecl.
  • Return type:
    • H for int.
    • ?AV?$basic_string@DU?$char_traits@D@std@@V?$allocator@D@2@@std@ (followed by @ to terminate) for std::basic_string<char, std::char_traits<char>, std::allocator<char>> (std::string for short).
  • Parameter list: H for int (followed by @ to terminate).
  • Exception specifier: Z for throw(...); this one is omitted from demangled names unless it's something else, probably because MSVC just ignores it anyway.

This allows it to whine at you if declarations aren't identical across every compilation unit.


Generally, most compilers will use one of those schemes (or sometimes a variation thereof) when targeting *nix or Windows, respectively, but this isn't guaranteed. For example...

  • Clang, to my knowledge, will use the GCC scheme for *nix, or the MSVC scheme for Windows.
  • Intel C++ uses the GCC scheme for Linux and Mac, and the MSVC scheme (with a few minor variations) for Windows.
  • The Borland and Watcom compilers have their own schemes.
  • The Symantec and Digital Mars compilers generally use the MSVC scheme, with a few small changes.
  • Older versions of GCC, and a lot of UNIX tools, use a modified version of cfront's mangling scheme.
  • And so on...

Schemes used by other compilers are thanks to Agner Fog's PDF.


Note:

Examining the generated symbols, it becomes apparent that GCC's mangling scheme doesn't provide the same level of protection against Machiavelli as MSVC's. Consider the following:

// foo.cpp
#include <string>

// Simple wrapper class, to avoid encoding `cxx11 ABI` into the GCC name.
class MyString {
    std::string data;

  public:
    MyString(const char* const d) : data(d) {}

    operator std::string() { return data; }
};

// Evil.
MyString foo(int i) { return "hello"; }

// -----

// main.cpp
#include <iostream>

// Evil.
int foo(int);

int main() {
    std::cout << foo(3) << '\n';
}

If we compile each source file separately, then attempt to link the object files together...

  • GCC: MyString, due to not being part of the cxx11 ABI, causes MyString foo(int) to be mangled as _Z3fooi, just like int foo(int). This allows the object files to be linked, and an executable is produced. Attempting to run it causes a segfault.
  • MSVC: The linker will look for ?foo@@YAHH@Z; as we instead supplied ?foo@@YA?AVMyString@@H@Z, linking will fail.

Considering this, a mangling scheme that includes the return type is safer, even though functions can't be overloaded solely on differences in return type.

like image 67
Justin Time - Reinstate Monica Avatar answered Oct 12 '22 11:10

Justin Time - Reinstate Monica


No, and I expect that their mangled name will be the same with all modern compilers. More importantly, using them in the same program results in undefined behavior. Functions in C++ cannot differ only in their return type.

like image 30
Sam Varshavchik Avatar answered Oct 12 '22 10:10

Sam Varshavchik