Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why doesn't g++ generate "raw" symbols?

From C we know what legal variable names are. The general regex for the legal names looks similar to [\w_](\w\d_)*.

Using dlsym we can load arbitrary strings, and C++ mangles names that include @ in the ABI..

My question is: can arbitrary strings be used? The documentation on dlsym does not seem to mention anything.

Another question that came up appears to imply that it is fully possible to have arbitrary null-terminated symbols. This inquires me to ask the following question:

Why doesn't g++ emit raw function signatures, with name and parameter list, including namespace and class membership?

Here's what I mean:

namespace test {
class A
{
    int myFunction(const int a);
};
}

namespace test {
int A::myFunction(const int a){return a * 2;}
}

Does not get compiled to

int ::test::A::myFunction(const int a)\0

Instead, it gets compiled to - on my 64 bit machine, using g++ 4.9.2 -

0000000000000000 T _ZN4test1A10myFunctionEi

This output is read by nm. The code was compiled using g++ -c test.cpp -o out

like image 977
Ultimate Hawk Avatar asked Aug 10 '15 14:08

Ultimate Hawk


3 Answers

I'm sure this decision was pragmatically made to avoid having to make any changes to pre-existing C linkers (quite possibly even originated from cfront). By emitting symbols with the same set of characters the C linker is used to you don't have to possibly make any number of updates and can use the linker off the shelf.

Additionally C and C++ are widely portable languages and they wouldn't want to risk breaking a more obscure binary format (perhaps on an embedded system) by including unexpected symbols.

Finally since you can always demangle (with something like gc++filt for example) it probably didn't seem worth using a full text representation.

P.S. You would absolutely not want to include the parameter name in the function name: People will not be happy if renaming a parameter breaks ABI. It's hard enough to keep ABI compatibility already.

like image 75
Mark B Avatar answered Nov 11 '22 04:11

Mark B


GCC is compliant with the Itanium C++ ABI. If your question is “Why does the Itanium C++ ABI require names to be mangled that way?” then the answer is probably

  1. because its designers thought this would b a good idea and
  2. shorter symbols make for smaller object files and faster dynamic linking.

For the second point, there is a pretty good explanation in Ulrich Drepper's article How To Write Shared Libraries.

like image 27
5gon12eder Avatar answered Nov 11 '22 05:11

5gon12eder


  1. Because of limitations on the exported names imposed by a linker (and that includes the OS's dynamic linker) - character set, length. The very phenomenon of mangling arose because of this.
    • Corollary: in media where these limitations don't exist (various VMs that use their own linkers: e.g. .NET, Java), mangling doesn't exist, either.
  2. Each compiler that produces exports that are incompatible with others must use a different scheme. Because linker (static or dynamic) doesn't care about ABIs, all it cares about is identifiers.
like image 40
ivan_pozdeev Avatar answered Nov 11 '22 03:11

ivan_pozdeev