Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Given return address, how to get the address of the function?

Tags:

c

x86

assembly

Suppose in a piece of C code, I have a function foo that calls bar. While inside bar, I can use assembly to get the address to which bar will return to. How do I use this information to determine the address of foo?

One approach would be to obtain the return address that foo will return to, and get the address from the opcode of the call instruction that calls foo. However, this requires knowing which calling method (e.g. offset/absolute) is used, therefore unreliable. Is there an easier way to do determine the address of the caller?

edit: I forgot to mention that this question is about IA32 assembly on 32-bit Intel unix machines.

like image 665
user2467539 Avatar asked Mar 23 '23 15:03

user2467539


2 Answers

In Linux, you can use dladdr() to resolve the calling function, by using:

#define _GNU_SOURCE
#include <dlfcn.h>

...

void *retAddr = __builtin_extract_return_addr(__builtin_return_address(0));
Dl_info d;
(void)dladdr(retAddr, &d);
printf("%s called from %s + 0x%p\n",
    __FUNC__,
    d.dli_sname,
    (retAddr - d.dli_saddr));

See GCC docs, __builtin_return_address() and Linux manpage dladdr(3) for details.

The function dladdr() is available on Solaris/MacOSX/*BSD as well but needs other preprocessor defines than _GNU_SOURCE to become visible; see the manpages for the respective operating system(s) ...

Edit: Note that since this relies on the presence of a symbol table, it might not resolve successfully on stripped binaries. I've not tried to add error handling to the above; in general, any type of automatic backtracing (with function name resolution) support doesn't like symbol tables being stripped off.

For a really quick one, I sometimes simply use:

#include <execinfo.h>

...

void *retAddr[10];
backtrace_symbols_fd(retAddr, backtrace(retaddr, 10), STDERR_FILENO);

as that gets a ten-entry deep stacktrace. Again, reliant on not having symtabs stripped off. There's a performance penalty for this as you're resolving more than a single addr.

Edit2: Without symbol tables (which, amongst other things, contain start address and size for functions within the executable/library), the information what's a "start address" is rather meaningless; as far as the CPU itself is concerned, there's not really any record kept of how the instruction pointer arrived at the place it is at a specific moment - the assembly-equivalent of goto (jmp) or other strange concoctions of self-modifying instructions are just as "valid" to the CPU as is properly-structured, compiler-generated code. x86 instructions are variable size, and the opcode map is dense enough so that just about any random sequence of bytes makes up a "valid" instruction stream; heuristic backwards-disassembling of binary code is therefore not a 100% safe thing to do.

Symbol tables, in that sense, establish "markers" for debuggers as well. You can be expected to find a valid instruction stream if you start disassembling at function start addresses as recorded in the symbol table, and can cross-verify that by validating that any return addresses found in backtraces are actually preceded by a call instruction.

like image 76
FrankH. Avatar answered Mar 26 '23 04:03

FrankH.


One approach would be to obtain the return address that foo will return to, and get the address from the opcode of the call instruction that calls foo.

Eh? That will give you the address of bar, not foo.

All you need is the highest procedure entry point that is lower than the return address.

like image 35
user207421 Avatar answered Mar 26 '23 05:03

user207421