If I want to expose a single machine specific instruction to the programmer, there are two ways I can do so :
I have read that builtins allow the compiler to take care of the type checking, register allocation and "other optimizations" etc. But the compiler will need to do this even in case of asm (), right ? So what precisely is the performance benefit of using intrinsic over asm () for a single instruction ?
How does the equation change if there are multiple machine instructions involved ?
The "portability" argument in favor of intrinsic is understandable, but I am curious to understand the performance advantage, if any, of one over the other.
I think it depends a lot on what you're doing. Modifying GCC, and requiring a modified GCC to build your program unless/until your GCC patch makes it upstream, is a lot more of a headache than just using inline asm.
If the instruction you want to use has an abstract meaning not tied down to a particular instruction set architecture, adding the builtin/intrinsic so that the same code using it could automatically work on all targets (with fallback to a more complex implementation with multiple instructions on targets that don't have the instruction) is probably the "right" choice, but might not be practical still.
If the instruction is something very ISA-specific, obscure, not-performance-critical, etc. (I'm thinking of loading a special hardware register, cpu mode register, getting model info, etc. but I'm sure you can think of other examples) then just using inline asm is almost certainly the right solution.
Even if you do think a builtin is the "right" solution for your problem, but need to take the inline asm approach for practical reasons, you can still abstract it with a macro or static inline function in such a way that it's easy to replace all uses with an intrinsic later (or with a fallback implementation on targets without the instruction).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With