Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why unused objects in STATIC lib included in final binary when SHARED lib reference them?

Summary:

Cross used function between STATIC and SHARED lib lead to have all objects of STATIC lib (even unused!) to be included in final binary!

You don't understand what I mean I suppose ? :-p

Sit and read the full story below ! Name have been change to protect the innocent. Example's target was simplicity and reproducibility.

Teaser : there's a SSCCE available! ( Short, Self Contained, Correct (Compilable), Example : http://www.sscce.org/ )

At beginning, I had :

  • a binary (main) calling a function (fun1a()) stored in a STATIC lib (libsub.a). main also have an internal function (mainsub()).

  • a STATIC lib (libsub.a) that is containing SEVERAL objects each with several functions used by other sources.

Compiling main result in a binary having ONLY a copy of the object(s) (STATIC lib) containing the referenced functions. In the example below, main will only contain functions of object shared1.o (because main is calling func1a() ) and NOT functions of shared2.o (because no references).

OK !

  main.c                 libsub.a    
+-------------+        +------------+
| main        |        | shared1.o  |
|  func1a()   | <----> |   func1a() |
|  mainsub()  |        |   func1b() |
+-------------+        |    ----    |
                       | shared2.o  |
                       |   func2a() |
                       |   func2b() |
                       +------------+

As an improvement, I wanted to allow 'external' people to be able to overwrite functions called in main by their own code, without having to recompile MY binary.

I didn't provide source anyway, nor my static lib.

To do so, I intended to provide a "ready to fill" function skeleton source. (That's called a USER-EXIT ?! ) The use of SHARED / DYNAMIC lib could do that IMHO. The functions that could be overwritten, are either internal to main (mainsub()) or shared functions (func1a() ...) and would be stored in shared lib (.so) to be added/referenced during link.

New sources were created, prefixed with 'c', that would contain the 'Client' version of the 'standard' functions. The switch of using (or not) overwritten function is out of scope. Just take as is that if UE is true, then overwritten is made.

cmain.c is a new source containing Client_mainsub() that could be called 'in replacement' of mainsub()

cshared1.c is a new source containing Client_func1a() that could be called 'in replacement' of func1a(). Indeed all functions in shared1.c could have their replacement in cshared1.c

cshared2.c is a new source containing Client_func2a() that could be called 'in replacement' of func2a()

The overview becomes :

     main.c                          libsub.a                       clibsub.so
   +-----------------------+     +------------------------+     +--------------------+
   | main                  |     | shared1.o              |     | cshared1.o         |
   |  func1a() {}          |     |   func1a()             |     |   Client_func1a()  |
   |  mainsub()            | <-> |   { if UE              | <-> |    {do ur stuff }  |
   |  { if UE              |     |     Client_func1a()    |     |                    |
   |     Client_mainsub()  |     |     return           } |     | cshared2.o         |
   |     return           }|     |   func1b()             |     |   Client_func2a()  |
   +-----------------------+     |        -------         |    >|    {do ur stuff }  |
                ^                | shared2.o              |   / +--------------------+
    cmain.c     v                |   func2a()             |  /
   +--------------------+        |   { if UE              | /
   | cmain              |        |     Client_func2a()    |<
   |   Client_mainsub() |        |     return           } |
   |    {do ur stuff }  |        |   func2b()             |
   +--------------------+        +------------------------+

Here again, as main do not call func2a() nor func2b(), the (STATIC) object shared2.o is not included in the binary, and no reference to (SHARED) Client_func2a() exist either. OK !


Finally, simply overwriting functions was not enough (or too much !). I wanted external people to be able to call my function (or not) ... but ALSO allow them to do some stuff right BEFORE and/or right AFTER my function.

So instead of having func2a() stupidly replaced by Client_func2a(), we would have roughly in pseudo code:

       shared2.c              |          cshared2.c
                    (assume UE=true)
                              |
func2a()  {                   |Client_func2a() {
    if UE {}                  |
        Client_func2a()      ==>    do (or not) some stuf PRE call
                              |
                              |     if (DOIT)  {            // activate or not standard call
                              |         UE=false 
                              |         func2a()            // do standard stuff
                              |         UE=true
                              |     } else  
                              |     {   do ur bespoke stuff }
                              |     
                              |     do (or not) some stuf POST call
                              | }
                             <==
    } else
      { do standard stuff }
}

Remember that cshared2.c is provided to other people that could (or not) do their own stuff on the provided skeleton.

(Note : Setting UE to false and back to true in Client_func2a() avoids infinite loop in func2a() call ! ;-) )

Now comes my problem.

In that case, the result binary now includes shared2.o object despite NO call is made in main to any function of shared2.c nor cshared2.c !!!!!

After searching this looks to be because of the cross calls/reference :

shared2.o contains func2a() that may call Client_func2a()
cshared2.o contains Client_func2a() that may call func2a()

So why main binary is containing shared2.o ?

>dump -Tv main

main:

                        ***Loader Section***

                        ***Loader Symbol Table Information***
[Index]      Value      Scn     IMEX Sclass   Type           IMPid Name

[0]     0x00000000    undef      IMP     RW EXTref libc.a(shr_64.o) errno
[1]     0x00000000    undef      IMP     DS EXTref libc.a(shr_64.o) __mod_init
[2]     0x00000000    undef      IMP     DS EXTref libc.a(shr_64.o) exit
[3]     0x00000000    undef      IMP     DS EXTref libc.a(shr_64.o) printf
[4]     0x00000000    undef      IMP     RW EXTref libc.a(shr_64.o) __n_pthreads
[5]     0x00000000    undef      IMP     RW EXTref libc.a(shr_64.o) __crt0v
[6]     0x00000000    undef      IMP     RW EXTref libc.a(shr_64.o) __malloc_user_defined_name
[7]     0x00000000    undef      IMP     DS EXTref     libcmain.so Client_mainsub1
[8]     0x00000000    undef      IMP     DS EXTref   libcshared.so Client_func1b
[9]     0x00000000    undef      IMP     DS EXTref   libcshared.so Client_func1a
[10]    0x00000000    undef      IMP     DS EXTref   libcshared.so Client_func2b          <<< but why ??? ok bcoz func2b() is referenced ...
[11]    0x00000000    undef      IMP     DS EXTref   libcshared.so Client_func2a          <<< but why ??? ok bcoz func2a() is referenced ...
[12]    0x110000b50    .data    ENTpt     DS SECdef        [noIMid] __start
[13]    0x110000b78    .data      EXP     DS SECdef        [noIMid] func1a
[14]    0x110000b90    .data      EXP     DS SECdef        [noIMid] func1b
[15]    0x110000ba8    .data      EXP     DS SECdef        [noIMid] func2b                <<< but why this ? Not a single call is made in main ???
[16]    0x110000bc0    .data      EXP     DS SECdef        [noIMid] func2a                <<< but why this ? Not a single call is made in main ???

Note that simply putting in comment func2a() ( and func2b() ) solves the link issue (breaking the cross)... but it's not possible as I would like to keep a shared lib !?

The behavior is happening on AIX 7.1 with IBM XL C/C++ 12.1 , but it looks to be the same on Linux (Red Hat 5 + GCC 5.4 with some small changed in compilation param)

IBM XL C/C++ for AIX, V12.1 (5765-J02, 5725-C72)
Version: 12.01.0000.0000
Driver Version: 12.01(C/C++) Level: 120315
C Front End Version: 12.01(C/C++) Level: 120322
High-Level Optimizer Version: 12.01(C/C++) and 14.01(Fortran) Level: 120315
Low-Level Optimizer Version: 12.01(C/C++) and 14.01(Fortran) Level: 120321

So I figure out this is surely a misunderstanding. Can anyone explain ?


As promised here are the SSCCE. You can replay my problem by recreating/downloading the following small files and run go.sh (see comment inside the script)

Edit1 : added code into the question, not on external site as suggested

main.c

#include <stdio.h>
#include "inc.h"

extern void func1a (), func1b ();

int UEXIT(char* file, char* func)
{
    printf("      UEXIT file=<%s>   func=<%s>\n",file,func);
    return 1;   /* always true for testing */
}


main (){
    printf(">>> main\n");
    func1a ();
    mainsub ();
    printf("<<< main\n");
}

mainsub () {
    printf(">>> mainsub\n");

    if(UEXIT("main","mainsub")) {
        Client_mainsub1();
        return;
    }
    printf("<<< mainsub\n");
}

cmain.c

#include <stdio.h>
#include "inc.h"

void Client_mainsub1 () {
    printf(">>>>>> Client_mainsub1\n");
    printf("<<<<<< Client_mainsub1\n");
return;
}

inc.h

extern int UEXIT(char * fileName, char * functionName);

shared1.c

#include <stdio.h>
#include "inc.h"

void func1a (){
    printf(">>>>> func1a\n");
    if(UEXIT("main","func1a")) {
        Client_func1a();
        return;
    }
    printf("<<<<< func1a\n");
}

void func1b (){
    printf(">>>>> func1b\n");
    if(UEXIT("main","func1b")){
        Client_func1b();
        return;
    }
    printf("<<<<< func1b\n");
}

shared2.c

#include <stdio.h>
#include "inc.h"

void func2a (){
    printf(">>>>> func2a\n");
    if(UEXIT("main","func2a")) {
        Client_func2a();
        return;
    }
    printf("<<<<< func2a\n");
}

void func2b (){
    printf(">>>>> func2b\n");
    if(UEXIT("main","func2b")){
        Client_func2b();
        return;
    }
    printf("<<<<< func2b\n");
}

cshared1.c

#include <stdio.h>
#include "inc.h"

void Client_func1a () {
    int standardFunctionCall = 0;
    printf("\t>>>> Client_func1a\n");
    if (standardFunctionCall) {
        func1a();
    }
    printf("\t<<< Client_func1a\n");
    return;
}


void Client_func1b () {
    int standardFunctionCall = 0;
    printf("\t>>>> Client_func1b\n");
    if (standardFunctionCall) {
        func1b();
    }
    printf("\t<<< Client_func1b\n");
    return;
}

cshared2.c

#include <stdio.h>
#include "inc.h"

void Client_func2a () {
    int standardFunctionCall = 0;
    printf("\t>>>> Client_func2a\n");
    if (standardFunctionCall) {
        func2a();           /* !!!!!! comment this to avoid crossed link with shared2.c !!!!! */
    }
    printf("\t<<< Client_func2a\n");
    return;
}


void Client_func2b () {
    int standardFunctionCall = 0;
    printf("\t>>>> Client_func2b\n");
    if (standardFunctionCall) {
        func2b();           /* !!!!!! ALSO comment this to avoid crossed link with shared2.c !!!!! */
    }
    printf("\t<<< Client_func2b\n");
    return;
}

go.sh

#!/bin/bash

## usage :
## . ./go.sh
## so that the redefinition of LIBPATH is propagated to calling ENV ...
##    otherwise :  "Dependent module libcshared.so could not be loaded."


# default OBJECT_MODE to 64 bit (avoid explicitely setting -X64 options...)
export OBJECT_MODE=64
export LIBPATH=.:$LIBPATH

# Compile client functions for target binary
cc -q64 -c -o cmain.o cmain.c

# (1) Shared lib for internal function
cc -G -q64 -o libcmain.so cmain.o


# Compile common functions
cc -c shared2.c shared1.c

# Compile client common functions overwrite
cc -c cshared2.c cshared1.c


# (2) Built libsub.a for common functions (STATIC)
ar -rv libsub.a  shared1.o shared2.o

# (3) Built libcshared.so for client common functions overwrite (SHARED)
cc -G -q64 -o libcshared.so cshared1.o cshared2.o


# Finally built binary using above (1) (2) (3)
# main only call func1a() , so should only include objects shared1
# But pragmatically shared2 is also included if cshared2 reference a possible call to func2() in shared2 !!!!????
#   Check this with "nm main |grep shared2" or "nm main |grep func2" or "dump -Tv main |grep func2"
cc -q64 -o main main.c -bstatic libsub.a -bshared libcmain.so  libcshared.so

# result is the same without specifying -bstatic or -bshared
#cc -q64 -o main2 main.c libsub.a libcmain.so  libcshared.so


#If I split libcshared.so into libcshared1.so and libcshared2.so it is also the same :
#cc -G -q64 -o libcshared1.so cshared1.o
#cc -G -q64 -o libcshared2.so cshared2.o
#cc -q64 -o main4 main.c -bstatic libsub.a -bshared libcmain.so  libcshared1.so libcshared2.so

#If I do not inlcude libcshared2.so, binary is of course well working, without reference to cshared2 nor shared2 . 
# So why linker chooses to add STATIC shared2.o only if libcshared2.so is listed ?
# Is there a way to avoid this add of unused code ?
#cc -q64 -o main4 main.c -bstatic libsub.a -bshared libcmain.so  libcshared1.so

Edit2 : added RedHat version of go.sh script as requested

gored.sh

## usage :
## . ./gored.sh
## so that the redefinition of LD_LIBRARY_PATH is propagated to calling ENV ...
##    otherwise :  "Dependent module libcshared.so could not be loaded."
export LD_LIBRARY_PATH=.:$LD_LIBRARY_PATH

# Compile client functions for target binary
gcc -fPIC -c cmain.c

# (1) Shared lib for internal function
gcc -shared -o libcmain.so cmain.o


# Compile common functions
gcc -c shared2.c shared1.c

# Compile client common functions overwrite
gcc -fPIC -c cshared2.c cshared1.c


# (2) Built libsub.a for common functions (STATIC)
ar -rv libsub.a  shared1.o shared2.o

# (3) Built libcshared.so for client common functions overwrite (SHARED)
gcc -shared -o libcshared.so cshared1.o cshared2.o


# Finally built binary using above (1) (2) (3)
# main only call func1a() , so should only include objects shared1
# But pragmatically shared2 is also included if cshared2 reference a possible call to func2() in shared2 !!!!????
#   Check this with "nm main |grep shared2" or "nm main |grep func2" or "dump -Tv main |grep func2"
gcc -o main main.c libcmain.so  libcshared.so libsub.a

#If I split libcshared.so into libcshared1.so and libcshared2.so it is also the same :
gcc -shared -o libcshared1.so cshared1.o
gcc -shared -o libcshared2.so cshared2.o
cc -o main2 main.c libcmain.so  libcshared1.so libcshared2.so libsub.a

#If I do not inlcude libcshared2.so, binary is of course well working, without reference to cshared2 nor shared2 . 
# So why linker chooses to add STATIC shared2.o only if libcshared2.so is listed ?
# Is there a way to avoid this add of unused code ?
cc -o main3 main.c libcmain.so  libcshared1.so libsub.a

Or here the full above files (without gored.sh) in a single .tar.bz2. (6KB).

https://pastebin.com/KsaqacAu

Just copy/paste in a new file (ex poc.uue). Then type

uudecode poc.uue

and you should get poc.tar.bz2

unzip, untar go into poc folder and run

. ./go.sh

then

dump -Tv main 

or if under RedHat

nm main

example of result after gored.sh :

poc>nm main |grep func2
*                 U Client_func2a
                 U Client_func2b
0000000000400924 T func2a
000000000040095d T func2b
poc>nm main2 |grep func2
                 U Client_func2a
                 U Client_func2b
0000000000400934 T func2a
000000000040096d T func2b
poc>nm main3 |grep func2
poc>

Edit3: ASCII ART ! :-)
Here's the 'visual' final state with unused objects/references I think the linker is wrong to include. Or at least not smart enough to detect as unused. Maybe that's normal or there's an option to avoid having unused static code in final binary. This doesn't look as a complex situation as the surounded tagged 'UNUSED !?' code is linked with nothing ? Isn't it ?

     main.c                          libsub.a                                clibsub.so                          
   +-----------------------+       +-------------------------+           +-----------------------------+         
   | main                  |       | +---------------------+ |           | +-------------------------+ |         
   |  func1a();  <-------------\   | |shared1.o            | |           | | cshared1.o              | |         
   |  mainsub()            |    \------>func1a() { <-------------+     /-----> Client_func1a() {     | |         
   |  { if UE {            |       | |   if UE {           | |   |    /  | |      PRE-stuff          | |         
   |     Client_mainsub()  |       | |     Client_func1a() <-----C---/   | |      if (DOIT) {        | |         
   |     return  ^         |       | |     return          | |   |       | |        UE=false         | |         
   |    }        |         |       | |   } else {          | |   +----------------> func1a()         | |         
   |  }          |         |       | |     do std stuff    | |           | |        UE=true          | |         
   +-------------|---------+       | |   }                 | |           | |      } else {           | |         
                 |                 | |                     | |           | |        do bespoke stuff | |         
                 |                 | |  func1b() {         | |           | |      }                  | |         
                 |                 | |     same as above   | |           | |      POST-stuff         | |         
                 |                 | |  }                  | |           | |   }                     | |         
                 |                 | +---------------------+ |           | |   Client_func1b() {}    | |         
                 |                 |                         |           | +-------------------------+ |         
                 |              ***|*******U*N*U*S*E*D**?!***|*****U*N*U*S*E*D**?!*******U*N*U*S*E*D**?!****     
                 |              *  | +---------------------+ |           | +-------------------------+ |   *     
                 |              U  | |shared2.o            | |           | | cshared2.o              | |   U     
                 |              *  | |  func2a() { <-------------+     /-----> Client_func2a() {     | |   *     
                 |              N  | |   if UE {           | |   |    /  | |      PRE-stuff          | |   N     
    cmain.so     |              *  | |     Client_func2a())<-----C---/   | |      if (DOIT) {        | |   *     
   +-------------|------+       U  | |     return          | |   |       | |        UE=false         | |   U     
   | cmain.o     v      |       *  | |   } else {          | |   +----------------> func2a()         | |   *     
   |   Client_mainsub() |       S  | |     do std stuff    | |           | |        UE=true          | |   S     
   |    {do ur stuff }  |       *  | |   }                 | |           | |      } else {           | |   *     
   +--------------------+       E  | |                     | |           | |        do bespoke stuff | |   E     
                                *  | |  func2b() {         | |           | |      }                  | |   *     
                                D  | |     same as above   | |           | |      POST-stuff         | |   D     
                                *  | |  }                  | |           | |   Client_func2b() {}    | |   *     
                                *  | +---------------------+ |           | +-------------------------+ |   *     
                                ?  +-------------------------+           +---------------------------+ |   ?     
                                !                                                                          !     
                                *********U*N*U*S*E*D**?!*************U*N*U*S*E*D**?!******U*N*U*S*E*D**?!***     

Any constructive answer to put me on the right way is welcome.

Thanks.

like image 619
NoobInside Avatar asked Feb 02 '18 14:02

NoobInside


People also ask

Why do we need shared libraries in addition to static ones?

The most significant advantage of shared libraries is that there is only one copy of code loaded in memory, no matter how many processes are using the library. For static libraries each process gets its own copy of the code. This can lead to significant memory wastage.

What is the difference between shared and static library?

Static libraries, while reusable in multiple programs, are locked into a program at compile time. Dynamic, or shared libraries, on the other hand, exist as separate files outside of the executable file.

Can a static library have unresolved symbols?

At link time, a static library can have unresolved symbols in it, as long as you don't need the unresolved symbols, and you don't need any symbol that is in a .o file that contains an unresolved symbol.

Does static library include dependencies?

So what is a Static library?? When linked like this the library is called a static library, because the library will remain unchanged unless the program is recompiled. This is the most straight forward way of using a library as the final result is a simple executable with no dependencies.


1 Answers

Here is a much simplified illustration of the linker behaviour that is puzzling you:

main.c

extern void foo(void);

int main(void)
{
    foo();
    return 0;
}

foo.c

#include <stdio.h>

void foo(void)
{
    puts(__func__);
}

bar.c

#include <stdio.h>

extern void do_bar(void);

void bar(void)
{
    do_bar();
}

do_bar.c

#include <stdio.h>

void do_bar(void)
{
    puts(__func__);
}

Let's compile all those source files to object files:

$ gcc -Wall -c main.c foo.c bar.c do_bar.c

Now we'll try to link a program, like so:

$ gcc -o prog main.o foo.o bar.o
bar.o: In function `bar':
bar.c:(.text+0x5): undefined reference to `do_bar'

The undefined function do_bar is referenced only in the definition of bar, and bar is not referenced in the program at all. Why then the linkage failure?

Quite simply, this linkage failed because we told the linker to link bar.o into the program; so it did; and bar.o contains the definition of bar, which references do_bar, which is not defined in the linkage. bar is not referenced, but do_bar is - by bar, which is linked in the program.

By default, the linker demands that any symbol that is referenced in the linkage of a program is defined in the linkage. If we compel it to link the definition of bar, then it will demand a definition of do_bar, because without a definition of do_bar it hasn't actually got a definition of bar. It if links a definition of bar, it does not question whether we need to link it, and then permit undefined references to do_bar if the answer is No.

The linkage failure is course fixable with:

$ gcc -o prog main.o foo.o bar.o do_bar.o
$ ./prog
foo

Now in this illustration, linking bar.o in the program is simply gratuitous. We can also link successfully just by not telling the linker to link bar.o.

gcc -o prog main.o foo.o
$ ./prog
foo

bar.o and do_bar.o are both are superfluous for executing main, but the program can only be linked with both, or with neither

But suppose foo and bar were defined in the same file?

They might be defined in the same object file, foobar.o:

ld -r -o foobar.o foo.o bar.o

And then:

$ gcc -o prog main.o foobar.o
foobar.o: In function `bar':
(.text+0x18): undefined reference to `do_bar'
collect2: error: ld returned 1 exit status

Now, the linker cannot link the definition of foo without also linking the definition of bar. So once again, we have to link a definition of do_bar:

$ gcc -o prog main.o foobar.o do_bar.o
$ ./prog
foo

Linked like this, prog contains definitions of foo, bar and do_bar:

$ nm prog | grep -e foo -e bar
000000000000065d T bar
0000000000000669 T do_bar
000000000000064a T foo

(T = defined function symbol).

Equally, foo and bar might be defined in the same shared library:

$ gcc -Wall -fPIC -c foo.c bar.c
$ gcc -shared -o libfoobar.so foo.o bar.o

and then this linkage:

$ gcc -o prog main.o -L. -lfoobar -Wl,-rpath=$(pwd)
./libfoobar.so: undefined reference to `do_bar'
collect2: error: ld returned 1 exit status

fails just as before, and is fixable in the same way:

$ gcc -o prog main.o do_bar.o -L. -lfoobar -Wl,-rpath=$(pwd)
$ ./prog
foo

When we link the shared library libfoobar.so rather than the object file foobar.o, our prog has a different symbol table:

$ nm prog | grep -e foo -e bar
00000000000007aa T do_bar
             U foo

This time, prog does not contain definitions of either foo or bar. It contains an undefined reference (U) to foo, because it calls foo, and of course that reference will now be satisfied, at runtime, by the definition in libfoobar.so. There's not even an undefined reference to bar, nor should there be, since the program never calls bar.

But still, prog contains the definition of do_bar, which is now unreferenced from all functions in the symbol table.

This echoes your own SSCCE, but in a less convoluted way. In your case:

  • The object file libsub.a(shared2.o) is linked into the program to provide definitions for func2a and func2b.

  • Those defintions must be found and linked because they are referenced, respectively, in the definitions of Client_func2a and Client_func2b, which are defined in libcshared.so.

  • libcshared.so must be linked to provide a definition of Client_func1a.

  • A definition of Client_func1a must be found and linked because it is referenced from the definition of func1a.

  • And func1a is called by main.

That's why we see:

$ nm main | grep func2
                 U Client_func2a
                 U Client_func2b
00000000004009f7 T func2a
0000000000400a30 T func2b

in the symbol table of your program.

It is is not at all unusual for definitions to be linked into a program for functions that it does not call. It usually happens in the way we we've seen: the linkage, recursively resolving symbol references starting with main, discovers that it needs a definition of f, which it can only get by linking some object file file.o, and with file.o it also links a definition of function g, which is never called.

What is rather odd is to end up with a program like your main and like my last version of prog, which contains a definition of an uncalled function (e.g do_bar) that is linked to resolve references from the definition of another uncalled function (e.g. bar) that is not defined in the program. Even if there are redundant function definitions, usually we can chain them back to one or more object files in the linkage where the first redundant definitions are pulled in along with some necessary defintions.

This oddity is caused, in a case like:

gcc -o prog main.o do_bar.o -L. -lfoobar -Wl,-rpath=$(pwd)

because the first redundant function definition that must be linked (bar) is provided by linking a shared library, libfoobar.so, while the definition of do_bar that is demanded by bar is not in that shared library, or any other shared library, but in an object file.

The definition of bar that's provided by libfoobar.so will stay there when the program is linked with that shared library. It won't be physically linked into the program. That's the nature of dynamic linkage. But any object file required by the linkage - whether it's a free-standing object file like do_bar.o or one that the linker extracts from an archive like libsub.a(shared2.o) - can only be linked physically into the program. So the redundant do_bar appears in the symbol table of prog. But the redundant bar, which explains why do_bar is there, isn't there. It is in the symbol table of libfoobar.so.

When you discover dead code in your program, you might like the linker to be smarter. Usually, it can be smarter, at the cost of some extra effort. You need to ask it to garbage-collect sections, and before that, you need to ask the compiler to prepare the way by generating data-sections and function-sections in the object files. See How to remove unused C/C++ symbols with GCC and ld?, and the answer

But this way of pruning dead code will not work in the unusual case where the dead code is linked in the program to satisfy redundant references from a shared library required by the linkage. The linker can only recursively garbage-collect unused sections from the ones that it outputs into the program, and it only outputs sections that are input from object files, not from shared libraries that are to be dynamically linked.

The right way to avoid the dead code in your main and my prog is not to do that peculiar kind of linkage in which a shared library will contain undefined references that the program does not call but that have to be resolved by linking dead object code into your program.

Instead, when you build a shared library, either don't leave any undefined references in it, or else leave only undefined references that shall by satisfied by its own dynamic dependencies.

So, the proper way to build my libfoobar.so is:

$ gcc -shared -o libfoobar.so foo.o bar.o do_bar.o

This gives me a shared library that has an API of:

void foo(void);
void bar(void);

for whoever wants either or both of them, and no undefined references. Then I build my program that is a client just of foo:

$ gcc -o prog main.o -L. -lfoobar -Wl,-rpath=$(pwd)
$ ./prog
foo

And it contains no dead code:

$ nm prog | grep -e foo -e bar
                 U foo

Similarly, if you build your libshared.so without undefined references, like:

$ gcc -c -fPIC shared2.c shared1.c
$ ar -crs libsub.a  shared1.o shared2.o
$ gcc -shared -o libcshared.so cshared1.o cshared2.o -L. -lsub

and then link your program:

$ gcc -o main main.c libcmain.so  libcshared.so

it too will have no dead code:

$ nm main | grep func
                 U func1a

If you dislike the fact that libsub.a(shared1.o) and libsub.a(shared2.o) become physically linked into libcshared.so by this solution, then take the other orthodox approach to linking a shared library: leave all the func* functions undefined in libcshared.so: make libsub also a shared library, which then is a dynamic dependency of libcshared.so.

like image 178
Mike Kinghan Avatar answered Oct 31 '22 23:10

Mike Kinghan