I have recently been made aware of GCC's built-in functions for some of the C library's memory management functions, specifically __builtin_malloc()
and related built-ins (see https://gcc.gnu.org/onlinedocs/gcc/Other-Builtins.html). Upon learning about __builtin_malloc()
, I was wondering how it might work to provide performance improvements over the plain malloc()
related library routines.
For example, if the function succeeds, it has to provide a block that can be freed by a call to plain free()
since the pointer might be freed by a module that was compiled without __builtin_malloc()
or __builtin_free()
enabled (or am I wrong about this,and if __builtin_malloc()
is used, the builtins must be globally used?). Therefore the allocated object has to be something that can be managed with the data structures that plain malloc()
and free()
deal with.
I can't find any details of how __builtin_malloc()
works or what it does exactly (I'm not a compiler dev, so spelunking through GCC source code isn't in my wheelhouse). In some simple tests where I've tried calling __builtin_malloc()
directly, it simply ends up being emitted in the object code as a call to plain malloc()
. However, there might be subtlety or platform detail that I'm not providing in these simple tests.
What kinds of performance improvements can __builtin_malloc()
provide over a call to plain malloc()
? Does __builtin_malloc()
have a dependency on the rather complex data structures that glibc's malloc()
implementation use? Or conversely, does glibc's malloc()
/free()
have some code to deal with blocks that might be allocated by __builtin_malloc()
?
Basically, how does it work?
Just because malloc returns zero-initialized memory the first time doesn't mean you can count on it in general. It also could be that the memory was set to 0 by the operating system or something and malloc had nothing to do with it.
malloc() takes a single argument (the amount of memory to allocate in bytes), while calloc() takes two arguments — the number of elements and the size of each element. malloc() only allocates memory, while calloc() allocates and sets the bytes in the allocated region to zero.
Initialization. malloc() allocates a memory block of given size (in bytes) and returns a pointer to the beginning of the block. malloc() doesn't initialize the allocated memory.
malloc doesn't initialize memory to zero. It returns it to you as it is without touching the memory or changing its value.
I believe there is no special GCC-internal implementation of __builtin_malloc()
. Rather, it exists as a builtin only so it can be optimized away under certain circumstances.
Take this example:
#include <stdlib.h> int main(void) { int *p = malloc(4); *p = 7; free(p); return 0; }
If we disable builtins (with -fno-builtins
) and look at the generated output:
$ gcc -fno-builtins -O1 -Wall -Wextra builtin_malloc.c && objdump -d -Mintel a.out 0000000000400580 <main>: 400580: 48 83 ec 08 sub rsp,0x8 400584: bf 04 00 00 00 mov edi,0x4 400589: e8 f2 fe ff ff call 400480 <malloc@plt> 40058e: c7 00 07 00 00 00 mov DWORD PTR [rax],0x7 400594: 48 89 c7 mov rdi,rax 400597: e8 b4 fe ff ff call 400450 <free@plt> 40059c: b8 00 00 00 00 mov eax,0x0 4005a1: 48 83 c4 08 add rsp,0x8 4005a5: c3 ret
Calls to malloc
/free
are emitted, as expected.
However, by allowing malloc
to be a builtin,
$ gcc -O1 -Wall -Wextra builtin_malloc.c && objdump -d -Mintel a.out 00000000004004f0 <main>: 4004f0: b8 00 00 00 00 mov eax,0x0 4004f5: c3 ret
All of main()
was optimized away!
Essentially, by allowing malloc
to be a builtin, GCC is free to eliminate calls if its result is never used, because there are no additional side-effects.
It's the same mechanism that allows "wasteful" calls to printf
to be changed to calls to puts
:
#include <stdio.h> int main(void) { printf("hello\n"); return 0; }
Builtins disabled:
$ gcc -fno-builtin -O1 -Wall builtin_printf.c && objdump -d -Mintel a.out 0000000000400530 <main>: 400530: 48 83 ec 08 sub rsp,0x8 400534: bf e0 05 40 00 mov edi,0x4005e0 400539: b8 00 00 00 00 mov eax,0x0 40053e: e8 cd fe ff ff call 400410 <printf@plt> 400543: b8 00 00 00 00 mov eax,0x0 400548: 48 83 c4 08 add rsp,0x8 40054c: c3 ret
Builtins enabled:
gcc -O1 -Wall builtin_printf.c && objdump -d -Mintel a.out 0000000000400530 <main>: 400530: 48 83 ec 08 sub rsp,0x8 400534: bf e0 05 40 00 mov edi,0x4005e0 400539: e8 d2 fe ff ff call 400410 <puts@plt> 40053e: b8 00 00 00 00 mov eax,0x0 400543: 48 83 c4 08 add rsp,0x8 400547: c3 ret
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With