Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

"Custom intrinsic" function for x64 instead of inline assembly possible?

I am currently experimenting with the creation of highly-optimized, reusable functions for a library of mine. For instance, I write the function "is power of 2" the following way:

template<class IntType>  
inline bool is_power_of_two( const IntType x )
{
    return (x != 0) && ((x & (x - 1)) == 0);
}

This is a portable, low-maintenance implementation as an inline C++ template. This code is compiled by VC++ 2008 to the following code with branches:

is_power_of_two PROC
    test    rcx, rcx
    je  SHORT $LN3@is_power_o
    lea rax, QWORD PTR [rcx-1]
    test    rax, rcx
    jne SHORT $LN3@is_power_o
    mov al, 1
    ret 0
$LN3@is_power_o:
    xor al, al
    ret 0
is_power_of_two ENDP

I found also the implementation from here: "The bit twiddler", which would be coded in assembly for x64 as follows:

is_power_of_two_fast PROC
    test rcx, rcx
    je  SHORT NotAPowerOfTwo
    lea rax, [rcx-1]
    and rax, rcx
    neg rax
    sbb rax, rax
    inc rax
    ret
NotAPowerOfTwo:
    xor rax, rax
    ret
is_power_of_two_fast ENDP

I tested both subroutines written separately from C++ in an assembly module (.asm file), and the second one works about 20% faster!

Yet the overhead of the function call is considerable: if I compare the second assembly implementation "is_power_of_two_fast" to the inline'd-version of the template function, the latter is faster despite branches!

Unfortunately, the new conventions for x64 specify that no inline assembly is allowed. One should instead use "intrinsic functions".

Now the question: can I implement the faster version "is_power_of_two_fast" as a custom intrinsic function or something similar, so that it can be used inline? Or alternatively, is it possible to somehow force the compiler to produce the low-branch version of the function?

like image 565
Angel Sinigersky Avatar asked Apr 04 '11 08:04

Angel Sinigersky


People also ask

How to write inline assembly in x86-64?

Microsoft's compiler doesn't support inline assembly for x86-64 targets, as you said. This forces you to define your assembly functions in an external code module (*.asm), assemble them with MASM, and link the result together with your separately-compiled C/C++ code.

Why can't my assembly functions be inline in C/C++?

This forces you to define your assembly functions in an external code module (*.asm), assemble them with MASM, and link the result together with your separately-compiled C/C++ code. The required separation of steps means that the C/C++ compiler cannot inline your assembly functions because they are not visible to it at the time of compilation.

How do I enable intrinsic functions in Visual Studio Code?

Open the project's Property Pages dialog box. For details, see Set C++ compiler and build properties in Visual Studio. Click the C/C++ folder. Click the Optimization property page. Modify the Enable Intrinsic Functions property.

How to use assembly code in MSVC in x64?

I know that MSVC compiler in x64 mode does not support inline assembly snippets of code, and in order to use assembly code you have to define your function in some external my_asm_funcs.asm file like that: And then in your .c or .h file you define a header for the function like that:


1 Answers

No, you cannot implement any custom intrinsics, they are all built into the compiler. It is not only the instructions that are built in, but the compiler also knows the semantics of the intrinsic, and adapts the code for different surrounding code.

One reason for inline assembly being removed for x86-64 is that inserting assembly into the middle of a function disturbs the optimizer, and often results in less well optimized code around the assembler code. There can easily be a net loss there!

The only real use for intrinsics are for "interesting" special instructions that the compiler cannot generate from C or C++ constructs, like BSF or BSR. Most everything else will work better using inline functions, like your template above.

If you need to do something special, that the compiler does not understand, the only real option is to write the entire function as a separate assembler module. If the call overhead for that function is too expensive, the optimization probably wasn't worth that much in the first place.

Trust your compiler(tm)!

like image 108
Bo Persson Avatar answered Nov 16 '22 01:11

Bo Persson