"Custom intrinsic" function for x64 instead of inline assembly possible?

Tags:

I am currently experimenting with the creation of highly-optimized, reusable functions for a library of mine. For instance, I write the function "is power of 2" the following way:

template<class IntType>  
inline bool is_power_of_two( const IntType x )
{
    return (x != 0) && ((x & (x - 1)) == 0);
}

This is a portable, low-maintenance implementation as an inline C++ template. This code is compiled by VC++ 2008 to the following code with branches:

is_power_of_two PROC
    test    rcx, rcx
    je  SHORT $LN3@is_power_o
    lea rax, QWORD PTR [rcx-1]
    test    rax, rcx
    jne SHORT $LN3@is_power_o
    mov al, 1
    ret 0
$LN3@is_power_o:
    xor al, al
    ret 0
is_power_of_two ENDP

I found also the implementation from here: "The bit twiddler", which would be coded in assembly for x64 as follows:

is_power_of_two_fast PROC
    test rcx, rcx
    je  SHORT NotAPowerOfTwo
    lea rax, [rcx-1]
    and rax, rcx
    neg rax
    sbb rax, rax
    inc rax
    ret
NotAPowerOfTwo:
    xor rax, rax
    ret
is_power_of_two_fast ENDP

I tested both subroutines written separately from C++ in an assembly module (.asm file), and the second one works about 20% faster!

Yet the overhead of the function call is considerable: if I compare the second assembly implementation "is_power_of_two_fast" to the inline'd-version of the template function, the latter is faster despite branches!

Unfortunately, the new conventions for x64 specify that no inline assembly is allowed. One should instead use "intrinsic functions".

Now the question: can I implement the faster version "is_power_of_two_fast" as a custom intrinsic function or something similar, so that it can be used inline? Or alternatively, is it possible to somehow force the compiler to produce the low-branch version of the function?

565

asked Apr 04 '11 08:04

Angel Sinigersky

1 Answers

No, you cannot implement any custom intrinsics, they are all built into the compiler. It is not only the instructions that are built in, but the compiler also knows the semantics of the intrinsic, and adapts the code for different surrounding code.

One reason for inline assembly being removed for x86-64 is that inserting assembly into the middle of a function disturbs the optimizer, and often results in less well optimized code around the assembler code. There can easily be a net loss there!

The only real use for intrinsics are for "interesting" special instructions that the compiler cannot generate from C or C++ constructs, like BSF or BSR. Most everything else will work better using inline functions, like your template above.

If you need to do something special, that the compiler does not understand, the only real option is to write the entire function as a separate assembler module. If the call overhead for that function is too expensive, the optimization probably wasn't worth that much in the first place.

Trust your compiler(tm)!

108

answered Nov 16 '22 01:11

Bo Persson

Related questions
                            
                                Increase the TCP receive window for a specific socket
                            
                                Sending Images over C++ Sockets (Linux)
                            
                                JavaScript Standard Library for V8
                            
                                Floating point C++ compiler options | preventing a/b -> a* (1/b)
                            
                                C++/Windows: How to report an out-of-memory exception (bad_alloc)?
                            
                                List All Open Files
                            
                                Does anybody have any experience with FastDB (C++ in-memory database)?
                            
                                C++ equivalent to Java's System.arraycopy
                            
                                Read another process' stdout in C++
                            
                                Disallow taking pointer/reference to const to a temporary object in C++ (no C++0X)
                            
                                SWIG C++ bindings with callback
                            
                                How to properly return std::string (or how to properly use that returned value)
                            
                                How do you set the cout locale to insert commas as thousands separators?
                            
                                Overriding c++ method in lua and call it back in c++
                            
                                mixing use of constexpr and const?
                            
                                Best sorting algorithm for case where many objects have "do-not-care" relationships to each other
                            
                                Where to put compile-time-constant arrays?
                            
                                Explanation for assert macro
                            
                                How to connect a C++ program to a WCF Service?
                            
                                Broken indentation for Qt-specific constructions in Visual Studio

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

"Custom intrinsic" function for x64 instead of inline assembly possible?

Tags:

c++

assembly

64-bit

intrinsics

inline-assembly

Angel Sinigersky

People also ask

1 Answers

Bo Persson

Recent Activity

Donate For Us