Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Find program's code address at runtime?

Tags:

c

When I use gdb to debug a program written in C, the command disassemble shows the codes and their addresses in the code memory segmentation. Is it possible to know those memory addresses at runtime? I am using Ubuntu OS. Thank you.

[edit] To be more specific, I will demonstrate it with following example.

#include <stdio.h>

int main(int argc,char *argv[]){
    myfunction();
    exit(0);
}

Now I would like to have the address of myfunction() in the code memory segmentation when I run my program.

like image 755
wakandan Avatar asked Jul 01 '09 15:07

wakandan


3 Answers

Above answer is vastly overcomplicated. If the function reference is static, as it is above, the address is simply the value of the symbol name in pointer context:

void* myfunction_address = myfunction;

If you are grabbing the function dynamically out of a shared library, then the value returned from dlsym() (POSIX) or GetProcAddress() (windows) is likewise the address of the function.

Note that the above code is likely to generate a warning with some compilers, as ISO C technically forbids assignment between code and data pointers (some architectures put them in physically distinct address spaces).

And some pedants will point out that the address returned isn't really guaranteed to be the memory address of the function, it's just a unique value that can be compared for equality with other function pointers and acts, when called, to transfer control to the function whose pointer it holds. Obviously all known compilers implement this with a branch target address.

And finally, note that the "address" of a function is a little ambiguous. If the function was loaded dynamically or is an extern reference to an exported symbol, what you really get is generally a pointer to some fixup code in the "PLT" (a Unix/ELF term, though the PE/COFF mechanism on windows is similar) that then jumps to the function.

like image 122
Andy Ross Avatar answered Nov 13 '22 19:11

Andy Ross


If you know the function name before program runs, simply use

void * addr = myfunction;

If the function name is given at run-time, I once wrote a function to find out the symbol address dynamically using bfd library. Here is the x86_64 code, you can get the address via find_symbol("a.out", "myfunction") in the example.

#include <bfd.h>
#include <stdio.h>
#include <stdlib.h>
#include <type.h>
#include <string.h>

long find_symbol(char *filename, char *symname)
{
    bfd *ibfd;
    asymbol **symtab;
    long nsize, nsyms, i;
    symbol_info syminfo;
    char **matching;

    bfd_init();
    ibfd = bfd_openr(filename, NULL);

    if (ibfd == NULL) {
        printf("bfd_openr error\n");
    }

    if (!bfd_check_format_matches(ibfd, bfd_object, &matching)) {
        printf("format_matches\n");
    }

    nsize = bfd_get_symtab_upper_bound (ibfd);
    symtab = malloc(nsize);
    nsyms = bfd_canonicalize_symtab(ibfd, symtab);

    for (i = 0; i < nsyms; i++) {
        if (strcmp(symtab[i]->name, symname) == 0) {
            bfd_symbol_info(symtab[i], &syminfo);
            return (long) syminfo.value;
        }
    }

    bfd_close(ibfd);
    printf("cannot find symbol\n");
}
like image 9
ZelluX Avatar answered Nov 13 '22 18:11

ZelluX


To get a backtrace, use execinfo.h as documented in the GNU libc manual.

For example:

#include <execinfo.h>
#include <stdio.h>
#include <unistd.h>


void trace_pom()
{   
    const int sz = 15;
    void *buf[sz];

    // get at most sz entries
    int n = backtrace(buf, sz);

    // output them right to stderr
    backtrace_symbols_fd(buf, n, fileno(stderr));

    // but if you want to output the strings yourself
    // you may use char ** backtrace_symbols (void *const *buffer, int size)
    write(fileno(stderr), "\n", 1);
}


void TransferFunds(int n);

void DepositMoney(int n)
{   
    if (n <= 0)
        trace_pom();
    else TransferFunds(n-1);
}


void TransferFunds(int n)
{   
    DepositMoney(n);
}


int main()
{   
    DepositMoney(3);

    return 0;
}

compiled

gcc a.c -o a -g -Wall -Werror -rdynamic

According to the mentioned website:

Currently, the function name and offset only be obtained on systems that use the ELF binary format for programs and libraries. On other systems, only the hexadecimal return address will be present. Also, you may need to pass additional flags to the linker to make the function names available to the program. (For example, on systems using GNU ld, you must pass (-rdynamic.)

Output

./a(trace_pom+0xc9)[0x80487fd]
./a(DepositMoney+0x11)[0x8048862]
./a(TransferFunds+0x11)[0x8048885]
./a(DepositMoney+0x21)[0x8048872]
./a(TransferFunds+0x11)[0x8048885]
./a(DepositMoney+0x21)[0x8048872]
./a(TransferFunds+0x11)[0x8048885]
./a(DepositMoney+0x21)[0x8048872]
./a(main+0x1d)[0x80488a4]
/lib/i686/cmov/libc.so.6(__libc_start_main+0xe5)[0xb7e16775]
./a[0x80486a1]
like image 5
Adrian Panasiuk Avatar answered Nov 13 '22 18:11

Adrian Panasiuk