Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

C function profiling (address seem to be offseted)

I'm trying to profile the function calls using -finstrument-functions option. Basically, what I have done is to write the following into any compiled source:

static int __stepper=0;
void __cyg_profile_func_enter(void *this_fn, void *call_site)
                              __attribute__((no_instrument_function));
void __cyg_profile_func_enter(void *this_fn, void *call_site) {
  int i=0;
  for( ; i<__stepper; i++ ) printf(" ");
  printf("E: %p %p\n", this_fn, call_site);
  __stepper ++;
} /* __cyg_profile_func_enter */

void __cyg_profile_func_exit(void *this_fn, void *call_site)
                             __attribute__((no_instrument_function));
void __cyg_profile_func_exit(void *this_fn, void *call_site) {
  int i=0;
  __stepper --;
  for( ; i<__stepper; i++ ) printf(" ");
  printf("L:  %p %p\n", this_fn, call_site);
} /* __cyg_profile_func_enter */

And got the following results:

 E: 0xb7597ea0 0xb75987a8
  E: 0xb7597de0 0xb7597ef5
  L:  0xb7597de0 0xb7597ef5
 L:  0xb7597ea0 0xb75987a8

All the function calls address is around that region (0xb7.......) But, if I try to read the symbols for function using 'readelf -s' it gives the following:

2157: 00101150   361 FUNC    LOCAL  DEFAULT   13 usb_audio_initfn
2158: 00100940   234 FUNC    LOCAL  DEFAULT   13 usb_audio_handle_reset
2159: 00100de0   867 FUNC    LOCAL  DEFAULT   13 usb_audio_handle_control

The address region of all the functions in binary is around 0x00...... So, I can not be able to get the function name from the function pointers. Looks like some how the function pointer gets an offset or something.

Anybody has any idea?

like image 774
jaeyong Avatar asked Oct 29 '13 10:10

jaeyong


2 Answers

What you need is this dladdr function. If you've built in debug mode the module (your main program or the shared library) in which the function in question is defined, then by calling the dladdr function you''ll get the function name based on its address and also the base address where the module (e.g. your shared library) is loaded:

#define _GNU_SOURCE
#include <dlfcn.h>

void find_func(void* pfnFuncAddr)
{
    Dl_info info;
    memset(&info,0,sizeof(info));
    if(dladdr(pfnFuncAddr,&info) && info.dli_fname)
    {
            /*here: 'info.dli_fname' contains the function name */
            /*      'info.dli_fbase' contains Address at which shared library is loaded */
    }
    else
    {
           /* if we got here it means that the module was not built with debug
              information or some other funny thing happened (e.g. we called function)
              written purely in assembly) */ 
    }
}

You have to add -ldl when linking.

Bear in mind that:

  • Function find_func needs to be called from your profiled process (read: somewhere from your __cyg_profile_func_enter or __cyg_profile_func_exit functions) because the address pfnFuncAddr is the actual function address (read: should be equal to this_fn or call_site arguments of the __cyg_* functions)

  • Function name that you'll get may be mangled (if it is a c++ function/method of a class). You can demangle the name using command line tool called c++filt. If you want to demangle from your profiler code then you need to look at the bfd library and functions like bfd_read_minisymbols bfd_demangle and friends. If you really want o profile your code demangling all the function names later (after profiling) may be a good idea.

  • The difference in address values that you observed is exactly the difference between the actual address of the function(s) in question and the base address at which the module that contains the function was loaded (read: the info.dli_fbase).

I hope that helps.

like image 31
sirgeorge Avatar answered Sep 18 '22 15:09

sirgeorge


From the question it looks like you're profiling a library function.

To know what are the functions being measured you have 2 options:

1 Run the program which uses the library under gdb and stop at main. At this point, get the pid of the program PID=... and do `cat /proc/$PID/maps'. There you should see something like this:

➜  ~  ps
  PID TTY          TIME CMD
18533 pts/4    00:00:00 zsh
18664 pts/4    00:00:00 ps
➜  ~  PID=18533
➜  ~  cat /proc/$PID/maps
00400000-004a2000 r-xp 00000000 08:01 3670052                            /bin/zsh5
006a1000-006a2000 r--p 000a1000 08:01 3670052                            /bin/zsh5
006a2000-006a8000 rw-p 000a2000 08:01 3670052                            /bin/zsh5
006a8000-006bc000 rw-p 00000000 00:00 0 
...
7fa174cc9000-7fa174ccd000 r-xp 00000000 08:01 528003                     /lib/x86_64-linux-gnu/libcap.so.2.22
7fa174ccd000-7fa174ecc000 ---p 00004000 08:01 528003                     /lib/x86_64-linux-gnu/libcap.so.2.22
7fa174ecc000-7fa174ecd000 r--p 00003000 08:01 528003                     /lib/x86_64-linux-gnu/libcap.so.2.22
7fa174ecd000-7fa174ece000 rw-p 00004000 08:01 528003                     /lib/x86_64-linux-gnu/libcap.so.2.22
...

Here 7fa174cc9000 is base address of the /lib/x86_64-linux-gnu/libcap.so.2.22 library. So all the addresses you get by readelf -s will be offset by that value. Knowing base address you can calculate back what the original offset in file was.

I.e. if you got the value 7fa174206370 and base address of the library is 7fa1741cf000 then offset is 7fa174206370 - 7fa1741cf000 = 37370. In my example it's sigsuspend from GLIBC:

94: 0000000000037370   132 FUNC    WEAK   DEFAULT   12 sigsuspend@@GLIBC_2.2.5

2 Run gdb on the program which uses these libraries. It'll either immediately find the loaded library in memory, or will need to be pointed to the .text section of the library.

> gdb
(gdb) attach YOUR_PID
(a lot of output about symbols)
(gdb) x/i 0x00007fa174206386
=> 0x7fa174206386 <sigsuspend+22>:  cmp    $0xfffffffffffff000,%rax

This way you know that 0x7fa174206386 is inside sigsuspend.

In case gdb doesn't load any symbols by itself (no output like Reading symbols from ... Loading symbols for ... after attach), you can look up the base address of library as in option 1, then add to it the offset of .text section

➜  ~  readelf -S /lib/x86_64-linux-gnu/libcap.so.2.22 | grep '.text.'
  [11] .text             PROGBITS         0000000000001620  00001620

7fa174cc9000 + 0000000000001620 in hexadecimal gives 7FA174CCA620, and then you attach by gdb as above and do

(gdb) add-symbol-file /lib/x86_64-linux-gnu/libcap.so.2.22 7FA174CCA620

Then you should be able to find symbols (via x/i ADDRESS as in option 1) even if gdb doesn't load them by itself.

Please ask if anything is unclear, I'll try to explain.

Clarification on why is this so:

The observed behavior is due to the libraries being compiled as Position-Independent Code. It allows us to easily support dynamic libraries. PIC essentially means that library's ELF has .plt and .got sections and can be loaded at any base address. PLT is procedure linkage table and it contains traps for calls of functions located in other modules, which first go to program interpreter to allow it to relocate the called function, and then just jump to the function after the first call. It works because program interpreter updates GOT (Global Offset Table), which contains addresses of functions to call. Initially the GOT is initialized so that on first function call the jump is performed to the function of program interpreter which performs resolution of currently called function.

On x86-64, PLT entries typically looks like this:

0000000000001430 <free@plt>:
    1430:       ff 25 e2 2b 20 00       jmpq   *0x202be2(%rip)        # 204018 <_fini+0x201264>
    1436:       68 00 00 00 00          pushq  $0x0
    143b:       e9 e0 ff ff ff          jmpq   1420 <_init+0x28>

The first jmpq is jump to address, stored in GOT at location %rip + 0x202be2:

  [20] .got              PROGBITS         0000000000203fd0  00003fd0
       0000000000000030  0000000000000008  WA       0     0     8

%rip + 0x202be2 will be 0x204012, and that gets added to the base address of the library to produce absolute address relevant to location where the library is actually loaded. I.e. if it's loaded at 0x7f66dfc03000, then the resulting address of corresponding GOT entry will be 0x7F66DFE07012. The address stored at that location is address of (in this example) free function. It's maintained by program interpreter to point to actual free in libc.

More information on this can be found here.

like image 184
Michael Pankov Avatar answered Sep 17 '22 15:09

Michael Pankov