Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

When will compilers optimize assembly code in C/C++ source? [closed]

Most of compilers do not optimize inline assembly code (VS2015, gcc), it allows us to write new instructions it doesn't support.

But when should a C/C++ compiler implement inline assembly optimizing?

like image 509
c2416726 Avatar asked May 19 '26 09:05

c2416726


1 Answers

Never. That would defeat the purpose of inline assembly, which is to get exactly what you ask for.

If you want to use the full power of the target CPU's instruction set in a way that the compiler can understand and optimize, you should use intrinsic functions, not inline asm.

e.g. instead of inline asm for popcnt, use int count = __builtin_popcount(x); (in GNU C compiled with -mpopcnt). Inline-asm is compiler-specific too, so if anything intrinsics are more portable, especially if you use Intel's x86 intrinsics which are supported across all the major compilers that can target x86. Use #include <x86intrin.h> and you can use int _popcnt32 (int a) to reliably get the popcnt x86 instruction. See Intel's intrinsics finder/guide, and other links in the x86 tag wiki.


int count(){ 
  int total = 0;
  for(int i=0 ; i<4 ; ++i)
    total += popc(i);
  return total;
}

Compiled with #define popc _popcnt32 by gcc6.3:

    mov     eax, 4
    ret

clang 3.9 with an inline-asm definition of popc, on the Godbolt compiler explorer:

    xor     eax, eax
    popcnt  eax, eax
    mov     ecx, 1
    popcnt  ecx, ecx
    add     ecx, eax
    mov     edx, 2
    popcnt  edx, edx
    add     edx, ecx
    mov     eax, 3
    popcnt  eax, eax
    add     eax, edx
    ret

This is a classic example of inline asm defeating constant propagation, and why you shouldn't use it for performance if you can avoid it: https://gcc.gnu.org/wiki/DontUseInlineAsm.


This was the inline-asm definition I used for this test:

int popc_asm(int x) {
  // force use of the same register because popcnt has a false dependency on its output, on Intel hardware
  // this is just a toy example, though, and also demonstrates how non-optimal constraints can lead to worse code
  asm("popcnt %0,%0" : "+r"(x));
  return x;
}

If you didn't know that popcnt has a false dependency on its output register on Intel hardware, that's another reason you should leave it to the compiler whenever possible.


Using special instructions that the compiler doesn't know about is one use-case for inline asm, but if the compiler doesn't know about it, it certainly can't optimize it. Before compilers were good at optimizing intrinsics (e.g. for SIMD instructions), inline asm for this kind of thing was more common. But we're many years beyond that now, and compilers are generally good with intrinsics, even for non-x86 architectures like ARM.

like image 108
Peter Cordes Avatar answered May 20 '26 23:05

Peter Cordes