Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Possible GCC linker bug causes error when linking weak and local symbols together

Tags:

c++

c

gcc

linker

ld

I'm creating a library and using objcopy to change the visibility of symbols from global to local to avoid exporting a load of internal symbols. If I use the --undefined flag to bring in an unused symbol from the library when linking, GCC gives me the following error:

`_ZStorSt13_Ios_OpenmodeS_' referenced in section `.text' of ./liblibrary.a(library_stripped.o): defined in discarded section `.text._ZStorSt13_Ios_OpenmodeS_[_ZStorSt13_Ios_OpenmodeS_]' of ./liblibrary.a(library_stripped.o)

Here are the two source files and makefile that reproduce the issue.

stringstream.cpp:

#include <iostream>
#include <sstream>
int main() {
   std::stringstream messagebuf;
   messagebuf << "Hello world";
   std::cout << messagebuf.str();
   return 0;
}

library.cpp:

#include <iostream>
#include <sstream>
extern "C" {
void keepme_lib_function() {
    std::stringstream messagebuf;
    messagebuf << "I'm a library function";
    std::cout << messagebuf.str();
}}

Makefile:

CC = g++

all: executable

#build a test program that uses stringstream
stringstream.o : stringstream.cpp
        $(CC) -g -O0 -o $@ -c $^

#build a library that also uses stringstream
liblibrary.a : library.cpp
        $(CC) -g -O0 -o library.o -c $^
        #Set all symbols to local that aren't intended to be exported (keep-global-symbol doesn't discard anything, just changes the binding value to local)
        objcopy --keep-global-symbol 'keepme_lib_function' library.o library_stripped.o 
        #objcopy --wildcard -W '!keepme_*' library.o library_stripped.o 
        rm -f $@
        ar crs $@ library_stripped.o

#Link the program with the library, and force keepme_lib_function to be kept in, even though it isn't referenced.
executable : clean liblibrary.a stringstream.o
        $(CC) -g -o stringstream stringstream.o -L. -Wl,--undefined=keepme_lib_function,-llibrary # -lgcc_eh -lstdc++ #may need to insert these depending on your environment

clean:
        rm -f library_stripped.o
        rm -f stringstream.o
        rm -f library.o
        rm -f liblibrary.a
        rm -f stringstream

If instead of the first objcopy command, I use the second (commented out) one to only weaken the symbols, it works. But I don't want to weaken the symbols, I want them to be local and not visible to people linking to the library at all.

Doing a readelf on the two object files gives the expected result for this symbol. Weak (global) in the program, and local in the library. As far as I know this should link correctly?

library.a:

22: 0000000000000000    18 FUNC    LOCAL  DEFAULT    6 _ZStorSt13_Ios_OpenmodeS_

stringstream.o

22: 0000000000000000    18 FUNC    WEAK   DEFAULT    6 _ZStorSt13_Ios_OpenmodeS_

Is this a bug with GCC, that when I force a function to be brought in from the library, it has already discarded local symbols? Am I doing the right thing by changing symbols to local in my libary?

like image 697
silverscania Avatar asked Apr 24 '17 10:04

silverscania


1 Answers

Groundwork

Let's fill out our knowledge of the offending symbol _ZStorSt13_Ios_OpenmodeS_ in your example.

readelf reports it identically in both library.o and stringstream.o:

$ readelf -s main.o | grep Bind
Num:    Value          Size Type    Bind   Vis      Ndx Name

$ readelf -s stringstream.o | grep _ZStorSt13_Ios_OpenmodeS_
25: 0000000000000000    18 FUNC    WEAK   DEFAULT    8 _ZStorSt13_Ios_OpenmodeS_

$ readelf -s library.o | grep _ZStorSt13_Ios_OpenmodeS_
25: 0000000000000000    18 FUNC    WEAK   DEFAULT    8 _ZStorSt13_Ios_OpenmodeS_

So it's a weak function symbol in both object files. It is visible for dynamic linkage (Vis = DEFAULT) in both files. It's defined in input linkage section #8 (Ndx = 8) in both files. Note that: it is defined in both object files, not just defined in one and maybe referenced in the other.

What sort of thing could that be? A global inline function. Its inline definition gets into both object files from one of your headers. g++ emits weak symbols for global inline functions to forestall multiple definition errors from the linker: weak symbols are allowed to be multiply defined in the linkage input (with any number of other weak definitions and at most one other strong definition).

Let's look at those linkage sections:

$ readelf -t stringstream.o
There are 31 section headers, starting at offset 0x130c0:

Section Headers:
  [Nr] Name
       Type              Address          Offset            Link
       Size              EntSize          Info              Align
       Flags
  ...
  ...
  [ 8] .text._ZStorSt13_Ios_OpenmodeS_
       PROGBITS               PROGBITS         0000000000000000  00000000000001b7  0
       0000000000000012 0000000000000000  0                 1
       [0000000000000206]: ALLOC, EXEC, GROUP

and:

$ readelf -t library.o 
There are 31 section headers, starting at offset 0x130d0:

Section Headers:
  [Nr] Name
       Type              Address          Offset            Link
       Size              EntSize          Info              Align
       Flags
  ...
  ...
  [ 8] .text._ZStorSt13_Ios_OpenmodeS_
       PROGBITS               PROGBITS         0000000000000000  00000000000001bc  0
       0000000000000012 0000000000000000  0                 1
       [0000000000000206]: ALLOC, EXEC, GROUP

They're identical, modulo position. The one notable point here is the section name itself, .text._ZStorSt13_Ios_OpenmodeS_, which is of the form: .text.<function_name>, and denotes: A function in the text (i.e program code) region.

We'd expect a function to be in the program code, but compare this with, say, your other function keepme_lib_function, which

$ readelf -s library.o | grep keepme_lib_function
26: 0000000000000000   246 FUNC    GLOBAL DEFAULT    3 keepme_lib_function

tells us is in section #3 of library.o. And section #3

$ readelf -t library.o
  ...
  ...
  [ 3] .text
       PROGBITS               PROGBITS         0000000000000000  0000000000000050  0
       0000000000000154 0000000000000000  0

is simply the .text section. Not .text.keepme_lib_function.

A input section of the form .text.<function_name>, like .text._ZStorSt13_Ios_OpenmodeS_, is a function-section. It's a code section that contains only the function <function_name>. So in both your stringstream.o and library.o, the function _ZStorSt13_Ios_OpenmodeS_ gets a function-section to itself.

This agrees with _ZStorSt13_Ios_OpenmodeS_ being an inline global function, and therefore weakly defined. Suppose a weak symbol has got multiple definitions in the linkage. Which definition will the linker pick? If any of the definitions is strong, the linker can allow at most one strong definition and must pick that one. But what if they're all weak? - which is what we've got here with _ZStorSt13_Ios_OpenmodeS_. In that case, the linker can pick any one of them, arbitrarily.

Either way, it will then have to discard all the rejected weak definitions of the symbol from the linkage. That's what is enabled by putting each weak definition of an inline global function in a function-section of its own. Then any competing definitions that the linker rejects can be dropped from the linkage by discarding the function-sections that contain them, with no collateral damage. That's why g++ emits those function-sections.

Finally let's identify the function:

$ c++filt _ZStorSt13_Ios_OpenmodeS_
std::operator|(std::_Ios_Openmode, std::_Ios_Openmode)

We can sleuth for this signature under /usr/include/c++, and locate it (for me) in /usr/include/c++/6.3.0/bits/ios_base.h:

inline _GLIBCXX_CONSTEXPR _Ios_Openmode
  operator|(_Ios_Openmode __a, _Ios_Openmode __b)
  { return _Ios_Openmode(static_cast<int>(__a) | static_cast<int>(__b)); }

where indeed it is an inline global function, and whence its definition gets into both your stringstream.o and library.o via <iostream>.

MVCE

Now let's make a simpler specimen of your linkage problem.

a.cpp

inline unsigned foo()
{
    return 0xf0a;
}

unsigned keepme_a() {
    return foo();
}

b.cpp

inline unsigned foo()
{
    return 0xf0b;
}

unsigned keepme_b() {
    return foo();
}

main.cpp

extern unsigned keepme_a();
extern unsigned keepme_b();

#include <iostream>

int main() {
    std::cout << std::hex << keepme_a() << std::endl;
    std::cout << std::hex << keepme_b() << std::endl;
    return 0;
}

And a makefile to expedite experiments:

CXX := g++
CXXFLAGS := -g -O0
LDFLAGS := -g -L. -Wl,--trace-symbol='_Z3foov',-M=prog.map,--cref

ifdef STRIP
A_OBJ := a_stripped.o
B_OBJ := b_stripped.o
else
A_OBJ := a.o
B_OBJ := b.o
endif

ifdef B_A
OBJS := main.o $(B_OBJ) $(A_OBJ)
else
OBJS := main.o $(A_OBJ) $(B_OBJ)
endif


.PHONY: all clean

all: prog

%_stripped.o: %.o
    objcopy --keep-global-symbol '_Z8keepme_$(*)v' $< $@

prog : $(OBJS) 
    $(CXX) $(LDFLAGS) -o $@ $^

clean:
    rm -f *.o *.map prog

With this makefile, by default we will link a program prog from untampered-with object files main.o, a.o, b.o, in that order.

If we define STRIP on the make commandline, we'll replace a.o and b.o respectively with the object files a_stripped.o and b_stripped.o that have been doctored with:

objcopy --keep-global-symbol '_Z8keepme_$(*)v' $< $@

in which all symbols other than _Z8keepme_{a|b}v, (demangled = keepme_{a|b}) have been forced to be LOCAL.

Furthermore, if we define B_A on the commandline, then the linkage order of a[_stripped].o and b[_stripped].o will be reversed.

Notice something about the definitions of the global inline function foo in a.cpp and b.cpp respectively: they're different. The former returns 0xf0a and the latter returns 0xf0b.

This makes any program we manage to build illegal per the C++ Standard: the One Definition Rule stipulates:

For an inline function ... a definition is required in every translation unit where it is odr-used.

and:

each definition consists of the same sequence of tokens (typically, appears in the same header file)

That's what the Standard stipulates, but the compiler of course cannot enforce any constraint on definitions in different translation units, and the GNU linker, ld, is not subject to the C++ Standard, or any language standard.

Let's do some experiments then.

The default build: make

$ make
g++ -g -O0   -c -o main.o main.cpp
g++ -g -O0   -c -o a.o a.cpp
g++ -g -O0   -c -o b.o b.cpp
g++ -g -L. -Wl,--trace-symbol='_Z3foov' -o prog main.o a.o b.o
a.o: definition of _Z3foov
b.o: reference to _Z3foov

Success. And thanks to the linker diagnostic --trace-symbol='_Z3foov', we're told that the program defines _Z3foov (demangled = foo) in a.o and references it in b.o.

So we input two different definitions of foo in a.o and b.o and in the resulting prog, we have just one. The definition in a.o was chosen and the one in b.o was ditched.

We can check by running the program, since it can (illegally) show us which definition of foo it calls:

$ ./prog
f0a
f0a

Yes, keepme_a() (from a.o) a keepme_b() (from b.o) are both calling foo from a.o.

We've also asked the linker to generate the map file prog.map, and right near the top of that map file we find:

Discarded input sections

...
 .text._Z3foov  0x0000000000000000        0xb b.o
...

The linker got rid of the b.o definition of foo by discarding the function-section .text._Z3foov from b.o.

make B_A=Yes

This time we'll just reverse the linkage order of a.o and b.o:

$ make clean
rm -f *.o *.map prog 
$ make B_A=Yes
g++ -g -O0   -c -o main.o main.cpp
g++ -g -O0   -c -o b.o b.cpp
g++ -g -O0   -c -o a.o a.cpp
g++ -g -L. -Wl,--trace-symbol='_Z3foov',-M=prog.map,--cref -o prog main.o b.o a.o
b.o: definition of _Z3foov
a.o: reference to _Z3foov

Success again. But this time, _Z3foov gets its definition from b.o and is only referenced in a.o. Check that out:

$ ./prog
f0b
f0b

And now the map file contains:

Discarded input sections

...
 .text._Z3foov  0x0000000000000000        0xb a.o
...

The function-section .text._Z3foov was this time dropped from a.o

How does that work?

Well we can see how the GNU linker makes its arbitrary choice between multiple weak definitions of a global inline function: it just picks the first definition it finds in the linkage sequence and drops the rest. By varying the linkage order we can get an arbitrary one of the definitions to be linked.

But, if an inline definition must be present in each translation unit that calls the function, as the Standard requires, how is the linker able to drop the inline definition from any arbitrary one of the translation units and get an object file that calls the definition inlined in some other one?

The compiler enables the linker to do it. Lets look at the assembly of a.cpp:

$ g++ -O0 -S a.cpp && cat a.s 
    .file   "a.cpp"
    .section    .text._Z3foov,"axG",@progbits,_Z3foov,comdat
    .weak   _Z3foov
    .type   _Z3foov, @function
_Z3foov:
.LFB0:
    .cfi_startproc
    pushq   %rbp
    .cfi_def_cfa_offset 16
    .cfi_offset 6, -16
    movq    %rsp, %rbp
    .cfi_def_cfa_register 6
    movl    $3850, %eax
    popq    %rbp
    .cfi_def_cfa 7, 8
    ret
    .cfi_endproc
.LFE0:
    .size   _Z3foov, .-_Z3foov
    .text
    .globl  _Z8keepme_av
    .type   _Z8keepme_av, @function
_Z8keepme_av:
.LFB1:
    .cfi_startproc
    pushq   %rbp
    .cfi_def_cfa_offset 16
    .cfi_offset 6, -16
    movq    %rsp, %rbp
    .cfi_def_cfa_register 6
    call    _Z3foov
    popq    %rbp
    .cfi_def_cfa 7, 8
    ret
    .cfi_endproc
.LFE1:
    .size   _Z8keepme_av, .-_Z8keepme_av
    .ident  "GCC: (Ubuntu 6.3.0-12ubuntu2) 6.3.0 20170406"
    .section    .note.GNU-stack,"",@progbits    

There, you see that symbol _Z3foov ( = foo) is given its function-section and classified weak:

    .section    .text._Z3foov,"axG",@progbits,_Z3foov,comdat
    .weak   _Z3foov

That symbol is assembled with the inline definition immediately following:

    _Z3foov:
    .LFB0:
        .cfi_startproc
        pushq   %rbp
        .cfi_def_cfa_offset 16
        .cfi_offset 6, -16
        movq    %rsp, %rbp
        .cfi_def_cfa_register 6
        movl    $3850, %eax
        popq    %rbp
        .cfi_def_cfa 7, 8
        ret
        .cfi_endproc

Then in _Z8keepme_av ( = keepme_a), foo is referred to via _Z3foov,

call    _Z3foov

not via the local label .LFB0 of the inline definition. You'll see the pattern identically in the assembly of b.cpp. Thus, the function-section containing that inline definition can be discarded from either a.o or b.o, and _Z3foov resolved to the definition in the other one, and both keepme_a() and keepme_b() will call the surviving definition through _Z3foov - as we've seen.

So much for experimental successes. Next to experimental failures:

make STRIP=Yes

$ make clean
rm -f *.o *.map prog
$ make STRIP=Yes
g++ -g -O0   -c -o main.o main.cpp
g++ -g -O0   -c -o a.o a.cpp
objcopy --keep-global-symbol '_Z8keepme_av' a.o a_stripped.o
g++ -g -O0   -c -o b.o b.cpp
objcopy --keep-global-symbol '_Z8keepme_bv' b.o b_stripped.o
g++ -g -L. -Wl,--trace-symbol='_Z3foov',-M=prog.map,--cref -o prog main.o a_stripped.o b_stripped.o
`_Z3foov' referenced in section `.text' of b_stripped.o: defined in discarded section `.text._Z3foov[_Z3foov]' of b_stripped.o
collect2: error: ld returned 1 exit status
Makefile:28: recipe for target 'prog' failed
make: *** [prog] Error 1

That reproduces your issue. And we have the symmetrical failure also if we reverse the linkage order:

make STRIP=Yes B_A=Yes

$ make clean
rm -f *.o *.map prog 
$ make STRIP=Yes B_A=Yes
g++ -g -O0   -c -o main.o main.cpp
g++ -g -O0   -c -o b.o b.cpp
objcopy --keep-global-symbol '_Z8keepme_bv' b.o b_stripped.o
g++ -g -O0   -c -o a.o a.cpp
objcopy --keep-global-symbol '_Z8keepme_av' a.o a_stripped.o
g++ -g -L. -Wl,--trace-symbol='_Z3foov',-M=prog.map,--cref -o prog main.o b_stripped.o a_stripped.o
`_Z3foov' referenced in section `.text' of a_stripped.o: defined in discarded section `.text._Z3foov[_Z3foov]' of a_stripped.o
collect2: error: ld returned 1 exit status
Makefile:28: recipe for target 'prog' failed
make: *** [prog] Error 1

Why is that?

As you might now already see, it's because the objcopy intervention creates an insoluble problem for the linker, as you can observe after that last make:

$ readelf -s a_stripped.o | grep _Z3foov
16: 0000000000000000    11 FUNC    LOCAL  DEFAULT    6 _Z3foov

$ readelf -s b_stripped.o | grep _Z3foov
16: 0000000000000000    11 FUNC    LOCAL  DEFAULT    6 _Z3foov

The symbol still has a definition in a_stripped.o and also in b_stripped.o, but the definitions are now LOCAL, not available to satisfy external references from other object files. Both definitions are in input section #6:

$ readelf -t a_stripped.o
  ...
  ...
  [ 6] .text._Z3foov
       PROGBITS               PROGBITS         0000000000000000  0000000000000053  0
       000000000000000b 0000000000000000  0                 1
       [0000000000000206]: ALLOC, EXEC, GROUP


$ readelf -t b_stripped.o
  ...
  ...
[ 6] .text._Z3foov
       PROGBITS               PROGBITS         0000000000000000  0000000000000053  0
       000000000000000b 0000000000000000  0                 1
       [0000000000000206]: ALLOC, EXEC, GROUP

which in each case remains a function-section .text._Z3foov

The linker can retain only one of the input .text._Z3foov function-sections for output in the .text section of prog and must discard the rest, to avert multiple definitions of _Z3foov. So it ticks the second-comer of those input sections, whether in a_stripped.o or b_stripped.o, to be discarded.

Say it's b_stripped.o that comes second. Our objcopy intervention has made _Z3foov local in both object files. So in keepme_b() the call to foo() can now only be resolved by the local definition - the one that's assembled after label .LFB0 in the assembly - which is in the .text._Z3foov function-section of b_stripped.o that is scheduled to be discarded. So that reference to foo() in b_stripped.o cannot be resolved in the program:

`_Z3foov' referenced in section `.text' of b_stripped.o: defined in discarded section `.text._Z3foov[_Z3foov]' of b_stripped.o

That's the explanation of your issue.

But...

... you might say: Isn't it an oversight on the linker's part not to check, before it decides to discard a function-section, if that section actually contains any a global function definition that might possibly collide with others?

You could argue that, but not very persuasively. Function-sections are things that only compilers create in the real world, and they are created for only two reasons:-

  • To let the linker discard global functions that aren't called by the program, without collateral damage.

  • To let the linker discard rejected surplus definitions of global inline functions, without collateral damage.

So it's reasonable for the linker to operate on the assumption that a function-section only exists to contain a definition of a global function.

A compiler will never trouble the linker with the scenario you've engineered, because a compiler just won't emit linkage sections that contain only local symbols. In our MCVE, we've got the option of making foo a local symbol in either a.o or b.o or both without going behind the compiler's back. We can either make it a static function or, more C++-ishly, we can put it in an anonymous namespace. For a final experiment, let's do that:

a.cpp (reprise)

namespace {

inline unsigned foo()
{
    return 0xf0a;
}

}

unsigned keepme_a() {
    return foo();
}

b.cpp (reprise)

namespace {

inline unsigned foo()
{
    return 0xf0b;
}

}

unsigned keepme_b() {
    return foo();
}

Build and run:

$ make && ./prog
g++ -g -O0   -c -o a.o a.cpp
g++ -g -O0   -c -o b.o b.cpp
g++ -g -L. -Wl,--trace-symbol='_Z3foov',-M=prog.map,--cref -o prog main.o a.o b.o
f0a
f0b

Now naturally, keepme_a() and keepme_b() each call their local definition of foo, and:

$ nm -s a.o
000000000000000b T _Z8keepme_av
0000000000000000 t _ZN12_GLOBAL__N_13fooEv
$ nm -s b.o
000000000000000b T _Z8keepme_bv
0000000000000000 t _ZN12_GLOBAL__N_13fooEv

_Z3foov is gone from the global symbol tables1, and:

$ echo \[$(readelf -t a.o | grep '.text._Z3foov')\]
[]
$ echo \[$(readelf -t b.o | grep '.text._Z3foov')\]
[]

the function-section .text._Z3foov is gone from both object files. The linker never knows of these local foos existence.

You don't have the option of getting g++ to make _ZStorSt13_Ios_OpenmodeS_ ( = std::operator|(_Ios_Openmode __a, _Ios_Openmode __b) a local symbol in your implementation of the Standard C++ library short of hacking ios_base.h, which of course you wouldn't.

But what you were trying to do was hack the linkage of this symbol from the Standard C++ library to make it local in one translation unit within your program and weakly global in another, and you blind-sided the linker, and yourself.

So...

Am I doing the right thing by changing symbols to local in my library?

No. Not unless they are symbols whose definitions you control, in your code, and then if you want them made local, make them local in the source code using one the language facilities for the purpose, and let the compiler take care of the object code.

If you want to further minimise symbol bloat, see How to remove unused C/C++ symbols with GCC and ld? Safe techniques allow the compiler to produce the lean object files that are linked, and/or allow the linker to pare fat, or at least operate on the linked binary, post linkage.

Tampering with the object files between the compiler and the linker is tampering at your peril, and never more so than if its tampering with the linkage of external library symbols.


[1] _ZN12_GLOBAL__N_13fooEv (demangled = (anonymous namespace)::foo()) has appeared, but it's local (t) not global (T) and is only in the symbol table at all because we're compiling with -O0.
like image 119
Mike Kinghan Avatar answered Oct 27 '22 16:10

Mike Kinghan