Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why does the linker modify a --defsym "absolute address"

Goal: a shared library to use a function from an executable (which does not export symbols).

Means: gcc -Wl,--defsym,function=0x432238

The man page states that:

"--defsym symbol=expression" Create a global symbol in the output
file, containing the absolute address given by expression.

To my dismay, dlopen() is adding 0x7ffff676f000, the shared library's base address (this is 64-bit code) to the exported "absolute symbol address":

        executable        shared library
        ---------- linker --------------
symbol: 0x432238   =====> 0x7ffff6ba1238

objdump shows the correct symbol address (0x432238) in the library, but once loaded with dlopen(), the symbol has address 0x7ffff6ba1238.

If, once loaded, I manually patch the library symbol to the correct address then all works fine (else, the library SEGFAULTs).

  • Why the "absolute address" is modified?
  • How to avoid it?

Update:

I contest the technical relevance of the reply below, and, even more its 'update':

Having --defsym to define a relocated symbol in a PIC library/executable is pointless (it does not serve ANY purpose other than polluting the binary without any usable feature).

Therefore, the only relevant use of --defsym in a PIC shared library or PIC executable should be to define a (non-relocated) "absolute address".

Incidentally, that's the official purpose of --defsym if you bother to read the man page:

"Create a global symbol in the output file, containing the absolute address given by expression."

At best, this is a Linux linker deffect which would be trivial to fix. And for those who can't wait for the people-in-denial to realize (and fix) their mistake, the solution is to patch the relocation table after the binary image has been loaded by the defective linker.

Then, --defsym becomes useful in PIC libraries/executables, which seems to me is a welcome progress.

like image 429
Gil Avatar asked Dec 03 '11 14:12

Gil


3 Answers

The behavior of --defsym changed between gcc 5.4.0 (using Ubuntu 16.04.4) and 7.3.0 (Ubuntu 18.04). At 5.40, --defsym created a symbol representing an absolute, non-relocatable address. At 7.3.0, readelf -s shows the symbol as "ABS", but the symbol is in fact relocated when the program executes. (This has caused problems for my application, too.)

An absolute address might represent something like a memory-mapped device register or an interrupt vector, something that stays in one place irrespective of where the application is loaded. The older behavior is correct - absolute addresses must not be relocated. If the relocation happens when the executable image is loaded into memory, it might not be a gcc problem, but it is a problem.

like image 96
Tom VanCourt Avatar answered Nov 15 '22 18:11

Tom VanCourt


Adding a counterpoint: yes there is an actual use to this but I think it's indeed broken, not only with dynamic libraries but also with position-independent executables.

ld itself will use symbols when used to embed binary files into executables:

ld -r -b binary hello_world.txt -o hello_world.o

this will produce an object file with, among others, the following symbols:

000000000000000c g       .data  0000000000000000 _binary_hello_world_txt_end
000000000000000c g       *ABS*  0000000000000000 _binary_hello_world_txt_size
0000000000000000 g       .data  0000000000000000 _binary_hello_world_txt_start

so that an executable that is include them can just use extern variables to access them. (... as in: our "hello world" text from hello_world.txt is the only thing in the .data section, with length 0xc).

Linking this object file into an executable file (and not stripping symbols) results in

0000000000411040 g     .data  0000000000000000              _binary_hello_world_txt_start
000000000041104c g     .data  0000000000000000              _binary_hello_world_txt_end
000000000000000c g     *ABS*  0000000000000000              _binary_hello_world_txt_size

and we can do things like

extern char _binary_hello_world_txt_start;
extern char _binary_hello_world_txt_size; // "char" is just made up in this one

// (...)
printf("text: %s\n", &_binary_hello_world_txt_start);
printf("number of bytes in it: %d\n", (int) (&_binary_hello_world_txt_size));

(yes it's looks fairly weird that we're looking for an address of something (which symbols are usually used for), and then we're treating it as an integer... but it actually works.)

Note also how the linker does know what it should relocate and what it shouldn't; the data pointers are relative to .data, while the size is *ABS*, which, as Gil describes, is not supposed to be relocated (... since it isn't calculated relatively to anything).

However, this only works in non-position-independent executables. Once you go from -fPIE (which is gcc's default lately in modern Linux distros, as it looks like) to -no-pie, the dynamic linker relocates everything, including *ABS* symbols. This is happening at runtime link time: the symbol tables look the same, regardless of how the executable was compiled.

The fact that the same thing happens for shared libraries seems to be a consequence of the same thing: the relocation of dynamically placed binaries (either a position-independent executable or a shared library) results in similar relocations, which do make sense for functions included in the binary itself, but not for *ABS* data.

Sadly, I don't have an answer to either of the questions: I also think it's done incorrectly, and I do not know how to fix it (see Getting the value of *ABS* symbols from C for another issue bumping into the same problem).

However, given how even GNU ld itself chooses to embed a size as a symbol this way... I do think this application / question is entirely valid, so as for answer:

  • ... it's done because the implementation isn't actually correct
  • as a workaround, "generating a header file with absolute addresses inline" comes to mind, following Employed Russian's answer

... but I'd actually be interested in how exactly to patch the relocation table the way Gil mentioned in the question!

like image 31
Latanius Avatar answered Nov 15 '22 17:11

Latanius


You appear to have fundamentally misunderstood what --defsym does.

--defsym=symbol=expression
   Create a global symbol in the *output* file, ...

That is, you are creating the new symbol in the library that you are building. As such, the symbol is (naturally) relocated with the library.

I am guessing you want something like this instead:

// code in library
int fn()
{
    // exe_fn not exported from the executable, but we know where it is.
    int (*exe_fn)(void) = (int (*)(void)) 0x432238;
    return (*exe_fn)();
}

If you didn't want to hard-code 0x432238 into the library, and instead pass the value on command line at build time, just use a -DEXE_FN=0x432238 to achieve that.

Update:

Goal: a shared library to use a function from an executable

That goal can not be achieved by the method you selected. You'll have to use other means.

Why the "absolute address" is modified?

It isn't. When you ask the linker to define function at absolute address 0x432238, it does exactly that. You can see it in objdump, nm and readelf -s output.

But because the symbol is defined in the shared library, all references to that symbol are relocated, i.e. adjusted by the shared library load address (that is done by the dynamic loader). It makes no sense whatsoever for the dynamic loader to do otherwise.

How to avoid it?

You can't. Use other means to achieve your goal.

like image 22
Employed Russian Avatar answered Nov 15 '22 18:11

Employed Russian