When I grep malloc from the symbol table, with the following command
readelf -s bin | grep malloc
I can see symbols malloc, __malloc and __libc_malloc share the same code address. I can get the PC address, want to know when a user program calls malloc, but __malloc and __libc_malloc gave me noisy information, any good ways to differentiate malloc out? As I compiled the binary with -static, so dlsym doesn't work in this case.
You're not going to be able to tell them apart unless you use dynamic linking as they will be the same thing, and the act of static linking will replace the name references with the address of the routine.
Take an example:
#include <stdlib.h>
extern void *__malloc(size_t);
extern void *__libc_malloc(size_t);
int
main(int argc, char **argv)
{
void *v = malloc(200);
free(v);
v = __malloc(200);
free(v);
v = __libc_malloc(200);
free(v);
return 0;
}
When compiled using: gcc -static -o example example.c
, and then we disassemble the main routine we see:
40103e: 55 push %rbp
40103f: 48 89 e5 mov %rsp,%rbp
401042: 48 83 ec 20 sub $0x20,%rsp
401046: 89 7d ec mov %edi,-0x14(%rbp)
401049: 48 89 75 e0 mov %rsi,-0x20(%rbp)
40104d: bf c8 00 00 00 mov $0xc8,%edi
401052: e8 19 52 00 00 callq 406270 <__libc_malloc>
401057: 48 89 45 f8 mov %rax,-0x8(%rbp)
40105b: 48 8b 45 f8 mov -0x8(%rbp),%rax
40105f: 48 89 c7 mov %rax,%rdi
401062: e8 09 56 00 00 callq 406670 <__cfree>
401067: bf c8 00 00 00 mov $0xc8,%edi
40106c: e8 ff 51 00 00 callq 406270 <__libc_malloc>
401071: 48 89 45 f8 mov %rax,-0x8(%rbp)
401075: 48 8b 45 f8 mov -0x8(%rbp),%rax
401079: 48 89 c7 mov %rax,%rdi
40107c: e8 ef 55 00 00 callq 406670 <__cfree>
401081: bf c8 00 00 00 mov $0xc8,%edi
401086: e8 e5 51 00 00 callq 406270 <__libc_malloc>
40108b: 48 89 45 f8 mov %rax,-0x8(%rbp)
40108f: 48 8b 45 f8 mov -0x8(%rbp),%rax
401093: 48 89 c7 mov %rax,%rdi
401096: e8 d5 55 00 00 callq 406670 <__cfree>
40109b: b8 00 00 00 00 mov $0x0,%eax
4010a0: c9 leaveq
4010a1: c3 retq
4010a2: 66 2e 0f 1f 84 00 00 nopw %cs:0x0(%rax,%rax,1)
4010a9: 00 00 00
4010ac: 0f 1f 40 00 nopl 0x0(%rax)
i.e. the code doesn't differentiate the entries.
Now, if you use dynamic linking; you get a different result. For one thing, __malloc
is not available in the resulting binary - this is because the __malloc
name is a side-effect of the static linking (there is a way to prevent it from being produced, but the mechanism escapes me at the moment). So when we compile the binary (removing the __malloc
call), main looks like:
40058d: 55 push %rbp
40058e: 48 89 e5 mov %rsp,%rbp
400591: 48 83 ec 20 sub $0x20,%rsp
400595: 89 7d ec mov %edi,-0x14(%rbp)
400598: 48 89 75 e0 mov %rsi,-0x20(%rbp)
40059c: bf c8 00 00 00 mov $0xc8,%edi
4005a1: e8 ea fe ff ff callq 400490 <malloc@plt>
4005a6: 48 89 45 f8 mov %rax,-0x8(%rbp)
4005aa: 48 8b 45 f8 mov -0x8(%rbp),%rax
4005ae: 48 89 c7 mov %rax,%rdi
4005b1: e8 9a fe ff ff callq 400450 <free@plt>
4005b6: bf c8 00 00 00 mov $0xc8,%edi
4005bb: e8 c0 fe ff ff callq 400480 <__libc_malloc@plt>
4005c0: 48 89 45 f8 mov %rax,-0x8(%rbp)
4005c4: 48 8b 45 f8 mov -0x8(%rbp),%rax
4005c8: 48 89 c7 mov %rax,%rdi
4005cb: e8 80 fe ff ff callq 400450 <free@plt>
4005d0: b8 00 00 00 00 mov $0x0,%eax
4005d5: c9 leaveq
4005d6: c3 retq
4005d7: 66 0f 1f 84 00 00 00 nopw 0x0(%rax,%rax,1)
4005de: 00 00
So to determine the use of __libc_malloc
or malloc
, you can check for calls to the plt entry for the routine.
This of course all assumes that you're actually performing some type of static analysis of the binary. If you're doing this at run-time, the usual method is library interception using LD_PRELOAD
, which is a whole different question.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With