What are __builtin__functions for in C++?

Question

I am debugging a transactional processing system which is performance sensitive.

I found a code which uses, __builtin_memcpy and __builtin_memset instead of memcpy and memset.

What are __builtin_functions for? ,to prevent the dependency problems on architecture or compiler?

Or.. is there any performance reason where __builtin_functions are prefered?

thank you :D

Mats Petersson · Accepted Answer

Traditional library functions, the standard memcpy is just a call to a function. Unfortunately, memcpy is often called for every small copies, and the overhead of calling a function, shuffling a few bytes and returning is quite a lot of overhead (especially since memcpy adds extra stuff to the beginning of the function to deal with unaligned memory, unrolling of the loop, etc, to do well on LARGE copies).

So, for the compiler to optimise those, it needs to "know" how to do for example memcpy - the solution for this is to have a function "builtin" into the compiler, which then contains code such as this:

 int generate_builtin_memcpy(expr arg1, expr arg2, expr size)
 {
     if (is_constant(size) && eval(size) < SOME_NUMBER)
     {
        ... do magic inline memory copy ... 
     }
     else
     {
         ... call "real" memcpy ... 
     }
 }

[For retargetable compilers, there is typically one of these functions for each CPU architecture, that has different configurations as to what conditions the "real" memcpy gets called, or when an inline memcpy is used.]

The key here is that you MAY actually write your own memcpy function, that ISN'T based on __builtin_memcpy(), which is ALWAYS a function, and doesn't do the same thing as normal memcpy [you'd be a bit in trouble if you change it's behaviour a lot, since the C standard library probably calls memcpy in a few thousand places - but for example doing statistics over how many times memcpy is called, and what sizes are copies could be one such use-case].

Another big reason for using __builtin_* is that they provide code that would otherwise have to be written in inline assembler, or possibly not available at all to the programmer. Setting/getting special registers would be such a thing.

There are other techniques to solve this problem, for example clang has a LibraryPass that assumes library-calls do common functions with other alternatives, for example since printf is much "heavier" than puts, it replaces suitable printf("constant string with no formatting ")s into puts("constant string with no formatting"), and many trigonometric and other math functions are resolved into common simple values when called with constants, etc.

Calling __builtin_* directly for functions like memcpy or sin or some such is probably the WRONG thing to do - it just makes your code less portable and not at all certain to be faster. Calling __builtin_special_function when there is no other is typically the solution in some tricky situations - but you should probably wrap it in your own function, e.g.

int get_magic_property()
{
    return __builtin_get_magic_property(); 
}

That way, when you port to Windows, you can easily do:

int get_magic_property()
{
#if WIN32
    return Win32GetMagicPropertyEx();
#else
    return __builtin_magic_property();
#endif
}

mjs · Answer

__builtin_* functions are optimised functions provided by the compiler libraries. These might be builtin versions of standard library functions, such as memcpy, and perhaps more typically some of the maths functions.

Alternatively, they might be highly optimised functions for typical tasks for that particular target - eg a DSP might have built-in FFT functions

Which functions are provided as __builtin_ are determined by the developers of the compiler, and will be documented in the manuals for the compiler.

Different CPU types and compilers are designed for different use cases, and this will be reflected in the range of built-in functions provided.

Built-in functions might make use of specialised instructions in the target processor, or might trade off accuracy for speed by using lookup tables rather than calculating values directly, or any other reasonable optimisation, all of which should be documented.

These are definitely not to reduce dependency on a particular compiler or cpu, in fact quite the opposite. It actually adds a dependency, and so these might be wrapped up in preprocessor checks eg

#ifdef SOME_CPU_FLAG
#define MEMCPY __builtin_memcpy
#else 
#define MEMCPY memcpy

anandmadhab · Answer

on a compiler note, __builtin_memcpy can fall back to emitting a memcpy function call. also less-capable compilers the ability to simplify, by choosing the slow path of unconditionally emitting a memcpy call.

http://lwn.net/Articles/29183/

What are builtinfunctions for in C++?

Tags:

c++

compilation

syko

3 Answers

Mats Petersson

mjs

anandmadhab

Recent Activity

Donate For Us