Why symbols malloc, __malloc and __libc_malloc point to the same code address?

Question

When I grep malloc from the symbol table, with the following command

readelf -s bin | grep malloc

I can see symbols malloc, __malloc and __libc_malloc share the same code address. I can get the PC address, want to know when a user program calls malloc, but __malloc and __libc_malloc gave me noisy information, any good ways to differentiate malloc out? As I compiled the binary with -static, so dlsym doesn't work in this case.

Petesh · Accepted Answer

You're not going to be able to tell them apart unless you use dynamic linking as they will be the same thing, and the act of static linking will replace the name references with the address of the routine.

Take an example:

#include <stdlib.h>

extern void *__malloc(size_t);
extern void *__libc_malloc(size_t);

int
main(int argc, char **argv)
{
    void *v = malloc(200);
    free(v);
    v = __malloc(200);
    free(v);
    v = __libc_malloc(200);
    free(v);
    return 0;
}

When compiled using: gcc -static -o example example.c, and then we disassemble the main routine we see:

  40103e:       55                      push   %rbp
  40103f:       48 89 e5                mov    %rsp,%rbp
  401042:       48 83 ec 20             sub    $0x20,%rsp
  401046:       89 7d ec                mov    %edi,-0x14(%rbp)
  401049:       48 89 75 e0             mov    %rsi,-0x20(%rbp)
  40104d:       bf c8 00 00 00          mov    $0xc8,%edi
  401052:       e8 19 52 00 00          callq  406270 <__libc_malloc>
  401057:       48 89 45 f8             mov    %rax,-0x8(%rbp)
  40105b:       48 8b 45 f8             mov    -0x8(%rbp),%rax
  40105f:       48 89 c7                mov    %rax,%rdi
  401062:       e8 09 56 00 00          callq  406670 <__cfree>
  401067:       bf c8 00 00 00          mov    $0xc8,%edi
  40106c:       e8 ff 51 00 00          callq  406270 <__libc_malloc>
  401071:       48 89 45 f8             mov    %rax,-0x8(%rbp)
  401075:       48 8b 45 f8             mov    -0x8(%rbp),%rax
  401079:       48 89 c7                mov    %rax,%rdi
  40107c:       e8 ef 55 00 00          callq  406670 <__cfree>
  401081:       bf c8 00 00 00          mov    $0xc8,%edi
  401086:       e8 e5 51 00 00          callq  406270 <__libc_malloc>
  40108b:       48 89 45 f8             mov    %rax,-0x8(%rbp)
  40108f:       48 8b 45 f8             mov    -0x8(%rbp),%rax
  401093:       48 89 c7                mov    %rax,%rdi
  401096:       e8 d5 55 00 00          callq  406670 <__cfree>
  40109b:       b8 00 00 00 00          mov    $0x0,%eax
  4010a0:       c9                      leaveq 
  4010a1:       c3                      retq   
  4010a2:       66 2e 0f 1f 84 00 00    nopw   %cs:0x0(%rax,%rax,1)
  4010a9:       00 00 00 
  4010ac:       0f 1f 40 00             nopl   0x0(%rax)

i.e. the code doesn't differentiate the entries.

Now, if you use dynamic linking; you get a different result. For one thing, __malloc is not available in the resulting binary - this is because the __malloc name is a side-effect of the static linking (there is a way to prevent it from being produced, but the mechanism escapes me at the moment). So when we compile the binary (removing the __malloc call), main looks like:

  40058d:       55                      push   %rbp
  40058e:       48 89 e5                mov    %rsp,%rbp
  400591:       48 83 ec 20             sub    $0x20,%rsp
  400595:       89 7d ec                mov    %edi,-0x14(%rbp)
  400598:       48 89 75 e0             mov    %rsi,-0x20(%rbp)
  40059c:       bf c8 00 00 00          mov    $0xc8,%edi
  4005a1:       e8 ea fe ff ff          callq  400490 <malloc@plt>
  4005a6:       48 89 45 f8             mov    %rax,-0x8(%rbp)
  4005aa:       48 8b 45 f8             mov    -0x8(%rbp),%rax
  4005ae:       48 89 c7                mov    %rax,%rdi
  4005b1:       e8 9a fe ff ff          callq  400450 <free@plt>
  4005b6:       bf c8 00 00 00          mov    $0xc8,%edi
  4005bb:       e8 c0 fe ff ff          callq  400480 <__libc_malloc@plt>
  4005c0:       48 89 45 f8             mov    %rax,-0x8(%rbp)
  4005c4:       48 8b 45 f8             mov    -0x8(%rbp),%rax
  4005c8:       48 89 c7                mov    %rax,%rdi
  4005cb:       e8 80 fe ff ff          callq  400450 <free@plt>
  4005d0:       b8 00 00 00 00          mov    $0x0,%eax
  4005d5:       c9                      leaveq 
  4005d6:       c3                      retq   
  4005d7:       66 0f 1f 84 00 00 00    nopw   0x0(%rax,%rax,1)
  4005de:       00 00

So to determine the use of __libc_malloc or malloc, you can check for calls to the plt entry for the routine.

This of course all assumes that you're actually performing some type of static analysis of the binary. If you're doing this at run-time, the usual method is library interception using LD_PRELOAD, which is a whole different question.

Why symbols malloc, malloc and libc_malloc point to the same code address?

Tags:

c++

c

linux

symbols

gcc

user1147800

1 Answers

Petesh

Recent Activity

Donate For Us