Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Keep all exported symbols when creating a shared library from a static library

I am creating a shared library from a static library for which I do not have the source code.

Many Stack Overflow questions provide answers on how to do that:

gcc -shared -o libxxx.so -Wl,--whole-archive libxxx.a -Wl,--no-whole-archive

However, some public functions of the static library are included as hidden functions in the shared library:

$ nm --defined-only libxxx.a | grep __intel_cpu_indicator_init
0000000000000000 T __intel_cpu_indicator_init
$ nm libxxx.so | grep __intel_cpu_indicator_init
00000000030bb160 t __intel_cpu_indicator_init

The __intel_cpu_indicator_init symbol went from exported to hidden.

It is not the only symbol that was hidden in the process:

$ nm libxxx.a | grep ' T ' | wc -l
37969
$ nm libxxx.so | grep ' T ' | wc -l
37548
$ nm libxxx.a | grep ' t ' | wc -l
62298
$ nm libxxx.so | grep ' t ' | wc -l
62727

Note that 37969 + 62298 = 100267 and 37548 + 62727 = 100275.

Is there anything I can do to have the linker produce a shared library with all public symbols from the static library also public in the shared library ?

like image 941
Étienne Avatar asked Feb 13 '19 07:02

Étienne


People also ask

Do static libraries have symbols?

At link time, a static library can have unresolved symbols in it, as long as you don't need the unresolved symbols, and you don't need any symbol that is in a .o file that contains an unresolved symbol.

What is the difference between static library and shared library?

Static libraries take longer to execute, because loading into the memory happens every time while executing. While Shared libraries are faster because shared library code is already in the memory. In Static library no compatibility issue has been observed.

What are the differences between static and dynamic shared libraries?

What are the differences between static and dynamic libraries? Static libraries, while reusable in multiple programs, are locked into a program at compile time. Dynamic, or shared libraries, on the other hand, exist as separate files outside of the executable file.

Can a dynamic library depend on a static library?

Yes for instance when you call windows functions from within your static lib they are normally from some dynamic library so there should be no difference.

How do I export symbols from a shared library?

The programmer specifies the list of symbols to be exported when the shared library is created. Usually a command-line option is passed to the linker, with the name of a file containing the symbols. The upside of this approach is flexibility: it allows the same code to be used in different libraries with different export lists.

What is the use of Lib-symbol-visibility?

The lib-symbol-visibility module allows precise control of the symbols exported by a shared library. It prevents abuse of undocumented APIs of your library. Symbols that are not exported from the library cannot be used.

Which static library can be used in a shared object?

We have now libutil.a static library which can be used in our shared object. Let's modify the shared object to include a reference to the code of libutil.a (without it, libutil.a would be dropped in the linking process):

Why can't I add symbols to a shared library?

You'd use it as e.g. What you're experiencing is that by default, the linker will search for symbols in a static archive that the binary you produce needs, and if it needs one, it'll include the whole .o that the symbol resides in. If your shared library doesn't need any of the symbols, they will not be included in your shared lib.


Video Answer


1 Answers

What you observe results when some of the global symbol definitions in some of the object files archived in libxxx.a were compiled with the function attribute or variable attribute visibility("hidden")

This attribute has the effect that when the object file containing the the global symbol definition is linked into a shared library:

  • The linkage of the symbol is changed from global to local in the static symbol table (.symtab) of the output shared library, so that when that shared library is linked with anything else, the linker cannot see the definition of the symbol.
  • The symbol definition is not added to the dynamic symbol table (.dynsym) of the output shared library (which by default it would be) so that when the shared library is loaded into a process, the loader is likewise unable to find a definition of the symbol.

In short, the global symbol definition in the object file is hidden for the purposes of dynamic linkage.

Check this out with:

$ readelf -s libxxx.a | grep HIDDEN

and I expect you to get hits for the unexported global symbols. If you don't, you need read no further because I have no other explanation of what you see and wouldn't count on any workaround I suggested not to shoot you in the foot.

Here is an illustration:

a.c

#include <stdio.h>

void aa(void)
{
    puts(__func__);
}

b.c

#include <stdio.h>

void __attribute__((visibility("hidden"))) bb(void)
{
    puts(__func__);
}

de.c

#include <stdio.h>

void __attribute__((visibility("default"))) dd(void)
{
    puts(__func__);
}

void ee(void)
{
    puts(__func__);
}

We'll compile a.c and b.c like so:

$ gcc -Wall -c a.c b.c

And we can see that symbols aa and ab are defined and global in their respective object files:

$ nm --defined-only a.o b.o

a.o:
0000000000000000 T aa
0000000000000000 r __func__.2361

b.o:
0000000000000000 T bb
0000000000000000 r __func__.2361

But we can also observe this difference:

$ readelf -s a.o

Symbol table '.symtab' contains 13 entries:
   Num:    Value          Size Type    Bind   Vis      Ndx Name
    ...
    10: 0000000000000000    19 FUNC    GLOBAL DEFAULT    1 aa
    ...

as compared with:

$ readelf -s b.o

Symbol table '.symtab' contains 13 entries:
   Num:    Value          Size Type    Bind   Vis      Ndx Name
    ...
    10: 0000000000000000    19 FUNC    GLOBAL HIDDEN     1 bb
    ...

aa is a GLOBAL symbol with DEFAULT visibility and bb is a GLOBAL symbol with HIDDEN visibility.

We'll compile de.c differently:

$ gcc -Wall -fvisibility=hidden -c de.c

Here, we're instructing the compiler that any symbol shall be given hidden visibility unless a countervailing visibility attribute is specified for it in the source code. And accordingly we see:

$ readelf -s de.o

Symbol table '.symtab' contains 15 entries:
     0: 0000000000000000     0 NOTYPE  LOCAL  DEFAULT  UND
    ...
    11: 0000000000000000    19 FUNC    GLOBAL DEFAULT    1 dd
    ...
    14: 0000000000000013    19 FUNC    GLOBAL HIDDEN     1 ee

Archiving these object files in a static library changes them in no way:

$ ar rcs libabde.a a.o b.o de.o

And then if we link all of them into a shared library:

$ gcc -o libabde.so -shared -Wl,--whole-archive libabde.a -Wl,--no-whole-archive

we find that:

$ readelf -s libabde.so | egrep '(aa|bb|dd|ee|Symbol table)'
Symbol table '.dynsym' contains 8 entries:
     6: 0000000000001105    19 FUNC    GLOBAL DEFAULT   12 aa
     7: 000000000000112b    19 FUNC    GLOBAL DEFAULT   12 dd
Symbol table '.symtab' contains 59 entries:
    45: 0000000000001118    19 FUNC    LOCAL  DEFAULT   12 bb
    51: 000000000000113e    19 FUNC    LOCAL  DEFAULT   12 ee
    54: 0000000000001105    19 FUNC    GLOBAL DEFAULT   12 aa
    56: 000000000000112b    19 FUNC    GLOBAL DEFAULT   12 dd

bb and ee, which were GLOBAL with HIDDEN visibility in the object files, are LOCAL in the static symbol of libabde.so and are absent altogether from its dynamic symbol table.

In this light, you may wish to re-evaluate your mission:

The symbols that have been given hidden visibility in the object files in libxxx.a have been hidden because the person who compiled them had a reason for wishing to conceal them from dynamic linkage. Do you have a countervailing need to export them for dynamic linkage? Or do you maybe just want to export them because you've noticed that they're not exported and don't know why not?

If you nonetheless want to unhide the hidden symbols, and cannot change the source code of the object files archived in libxxx.a, your least worst resort is to:

  • Extract each object file from libxxx.a
  • Doctor it to replace HIDDEN with DEFAULT visibility on its global definitions
  • Put it into a new archive libyyy.a
  • Then use libyyy.a instead of libxxx.a.

The binutils tool for doctoring object files is objcopy. But objcopy has no operations to directly manipulate the dynamic visibility of a symbol and you'd have to settle for a circuitous kludge that "achieves the effect of" unhiding the hidden symbols:

  • With objcopy --redefine-sym, rename each hidden global symbol S as, say, __hidden__S.
  • With objcopy --add-symbol, add a new global symbol S that has the same value as __hidden_S but gets DEFAULT visibility by default.

ending up with two symbols with the same definition: the original hidden one and a new unhidden alias for it.

Preferable to that would a means of simply and solely changing the visibility of a symbol in an ELF object file, and a means is to hand in the LIEF library (Library to Instrument Executable Formats) - Swiss Army Chainsaw for object and executable file alterations1.

Here is a Python script that calls on pylief, the LIEF Python module, to unhide the hidden globals in an ELF object file:

unhide.py

#!/usr/bin/python
# unhide.py - Replace hidden with default visibility on global symbols defined
#   in an ELF object file

import argparse, sys, lief
from lief.ELF import SYMBOL_BINDINGS, SYMBOL_VISIBILITY, SYMBOL_TYPES

def warn(msg):
    sys.stderr.write("WARNING: " + msg + "\n")

def unhide(objfile_in, objfile_out = None, namedsyms=None):
    if not objfile_out:
        objfile_out = objfile_in
    binary = lief.parse(objfile_in)
    allsyms = { sym.name for sym in binary.symbols }
    selectedsyms = set([])
    nasyms = { sym.name for sym in binary.symbols if \
                            sym.type == SYMBOL_TYPES.NOTYPE or \
                            sym.binding != SYMBOL_BINDINGS.GLOBAL or \
                            sym.visibility != SYMBOL_VISIBILITY.HIDDEN }
    if namedsyms:
        namedsyms = set(namedsyms)
        nosyms = namedsyms - allsyms
        for nosym in nosyms:
            warn("No symbol " + nosym + " in " + objfile_in + ": ignored")
        for sym in namedsyms & nasyms:
            warn("Input symbol " + sym + \
                " is not a hidden global symbol defined in " + objfile_in + \
                ": ignored")
        selectedsyms = namedsyms - nosyms
    else:
        selectedsyms = allsyms

    selectedsyms -= nasyms
    unhidden = 0;
    for sym in binary.symbols:
        if sym.name in selectedsyms:
            sym.visibility = SYMBOL_VISIBILITY.DEFAULT
            unhidden += 1
            print("Unhidden: " + sym.name)
    print("{} symbols were unhidden".format(unhidden))
    binary.write(objfile_out)

def get_args():
    parser = argparse.ArgumentParser(
        description="Replace hidden with default visibility on " + \
            "global symbols defined in an ELF object file.")
    parser.add_argument("ELFIN",help="ELF object file to read")
    parser.add_argument("-s","--symbol",metavar="SYMBOL",action="append",
        help="Unhide SYMBOL. " + \
            "If unspecified, unhide all hidden global symbols defined in ELFIN")
    parser.add_argument("--symfile",
        help="File of whitespace-delimited symbols to unhide")
    parser.add_argument("-o","--out",metavar="ELFOUT",
        help="ELF object file to write. If unspecified, rewrite ELFIN")
    return parser.parse_args()


def main():
    args = get_args()
    objfile_in = args.ELFIN
    objfile_out = args.out
    symlist = args.symbol
    if not symlist:
        symlist = []
    symfile = args.symfile
    if symfile:
        with open(symfile,"r") as fh:
            symlist += [word for line in fh for word in line.split()]
    unhide(objfile_in,objfile_out,symlist)

main()

Usage:

$ ./unhide.py -h
usage: unhide.py [-h] [-s SYMBOL] [--symfile SYMFILE] [-o ELFOUT] ELFIN

Replace hidden with default visibility on global symbols defined in an ELF
object file.

positional arguments:
  ELFIN                 ELF object file to read

optional arguments:
  -h, --help            show this help message and exit
  -s SYMBOL, --symbol SYMBOL
                        Unhide SYMBOL. If unspecified, unhide all hidden
                        global symbols defined in ELFIN
  --symfile SYMFILE     File of whitespace-delimited symbols to unhide
  -o ELFOUT, --out ELFOUT
                        ELF object file to write. If unspecified, rewrite
                        ELFIN

And here is a shell script:

unhide.sh

#!/bin/bash

OLD_ARCHIVE=$1
NEW_ARCHIVE=$2
OBJS=$(ar t $OLD_ARCHIVE)
for obj in $OBJS; do
    rm -f $obj
    ar xv $OLD_ARCHIVE $obj
    ./unhide.py $obj
done
rm -f $NEW_ARCHIVE
ar rcs $NEW_ARCHIVE $OBJS
echo "$NEW_ARCHIVE made"

that takes:

  • $1 = Name of an existing static library
  • $2 = Name for a new static library

and creates $2 containing the object files from $1, each modified with unhide.py to unhide all of its hidden global definitions.

Back with our illustration, we can run:

$ ./unhide.sh libabde.a libnew.a
x - a.o
0 symbols were unhidden
x - b.o
Unhidden: bb
1 symbols were unhidden
x - de.o
Unhidden: ee
1 symbols were unhidden
libnew.a made

and confirm that worked with:

$ readelf -s libnew.a | grep HIDDEN; echo Done
Done
$ readelf -s libnew.a | egrep '(aa|bb|dd|ee)'
    10: 0000000000000000    19 FUNC    GLOBAL DEFAULT    1 aa
    10: 0000000000000000    19 FUNC    GLOBAL DEFAULT    1 bb
    11: 0000000000000000    19 FUNC    GLOBAL DEFAULT    1 dd
    14: 0000000000000013    19 FUNC    GLOBAL DEFAULT    1 ee

Finally if we relink the shared library with the new archive

$  gcc -o libabde.so -shared -Wl,--whole-archive libnew.a -Wl,--no-whole-archive

all of the global symbols from the archive are exported:

$ readelf --dyn-syms libabde.so | egrep '(aa|bb|dd|ee)'
     6: 0000000000001105    19 FUNC    GLOBAL DEFAULT   12 aa
     7: 000000000000112b    19 FUNC    GLOBAL DEFAULT   12 dd
     8: 0000000000001118    19 FUNC    GLOBAL DEFAULT   12 bb
     9: 000000000000113e    19 FUNC    GLOBAL DEFAULT   12 ee

[1] Download C/C++/Python libraries

Debian/Ubuntu provides C/C++ dev package lief-dev.

like image 80
Mike Kinghan Avatar answered Nov 15 '22 11:11

Mike Kinghan