I am debugging a transactional processing system which is performance sensitive.
I found a code which uses, __builtin_memcpy and __builtin_memset instead of memcpy and memset.
What are __builtin_functions for? ,to prevent the dependency problems on architecture or compiler?
Or.. is there any performance reason where __builtin_functions are prefered?
thank you :D
Traditional library functions, the standard memcpy
is just a call to a function. Unfortunately, memcpy
is often called for every small copies, and the overhead of calling a function, shuffling a few bytes and returning is quite a lot of overhead (especially since memcpy
adds extra stuff to the beginning of the function to deal with unaligned memory, unrolling of the loop, etc, to do well on LARGE copies).
So, for the compiler to optimise those, it needs to "know" how to do for example memcpy
- the solution for this is to have a function "builtin" into the compiler, which then contains code such as this:
int generate_builtin_memcpy(expr arg1, expr arg2, expr size)
{
if (is_constant(size) && eval(size) < SOME_NUMBER)
{
... do magic inline memory copy ...
}
else
{
... call "real" memcpy ...
}
}
[For retargetable compilers, there is typically one of these functions for each CPU architecture, that has different configurations as to what conditions the "real" memcpy
gets called, or when an inline memcpy is used.]
The key here is that you MAY actually write your own memcpy
function, that ISN'T based on __builtin_memcpy()
, which is ALWAYS a function, and doesn't do the same thing as normal memcpy
[you'd be a bit in trouble if you change it's behaviour a lot, since the C standard library probably calls memcpy
in a few thousand places - but for example doing statistics over how many times memcpy
is called, and what sizes are copies could be one such use-case].
Another big reason for using __builtin_*
is that they provide code that would otherwise have to be written in inline assembler, or possibly not available at all to the programmer. Setting/getting special registers would be such a thing.
There are other techniques to solve this problem, for example clang
has a LibraryPass
that assumes library-calls do common functions with other alternatives, for example since printf
is much "heavier" than puts
, it replaces suitable printf("constant string with no formatting\n")
s into puts("constant string with no formatting")
, and many trigonometric and other math functions are resolved into common simple values when called with constants, etc.
Calling __builtin_*
directly for functions like memcpy
or sin
or some such is probably the WRONG thing to do - it just makes your code less portable and not at all certain to be faster. Calling __builtin_special_function
when there is no other is typically the solution in some tricky situations - but you should probably wrap it in your own function, e.g.
int get_magic_property()
{
return __builtin_get_magic_property();
}
That way, when you port to Windows, you can easily do:
int get_magic_property()
{
#if WIN32
return Win32GetMagicPropertyEx();
#else
return __builtin_magic_property();
#endif
}
__builtin_*
functions are optimised functions provided by the compiler libraries. These might be builtin versions of standard library functions, such as memcpy, and perhaps more typically some of the maths functions.
Alternatively, they might be highly optimised functions for typical tasks for that particular target - eg a DSP might have built-in FFT functions
Which functions are provided as __builtin_
are determined by the developers of the compiler, and will be documented in the manuals for the compiler.
Different CPU types and compilers are designed for different use cases, and this will be reflected in the range of built-in functions provided.
Built-in functions might make use of specialised instructions in the target processor, or might trade off accuracy for speed by using lookup tables rather than calculating values directly, or any other reasonable optimisation, all of which should be documented.
These are definitely not to reduce dependency on a particular compiler or cpu, in fact quite the opposite. It actually adds a dependency, and so these might be wrapped up in preprocessor checks eg
#ifdef SOME_CPU_FLAG
#define MEMCPY __builtin_memcpy
#else
#define MEMCPY memcpy
on a compiler note, __builtin_memcpy can fall back to emitting a memcpy function call. also less-capable compilers the ability to simplify, by choosing the slow path of unconditionally emitting a memcpy call.
http://lwn.net/Articles/29183/
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With