Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Arm Neon Intrinsics vs hand assembly

https://web.archive.org/web/20170227190422/http://hilbert-space.de/?p=22

On this site which is quite dated it shows that hand written asm would give a much greater improvement then the intrinsics. I am wondering if this is the current truth even now in 2012.

So has the compilation optimization improved for intrinsics using gnu cross compiler?

like image 518
George Host Avatar asked Mar 22 '12 18:03

George Host


People also ask

What is the use of neon floating point engine?

Multiple data types are supported by the technology, including floating-point and integer operations. Neon technology is intended to improve the multimedia user experience by accelerating audio and video encoding and decoding, user interface, 2D and 3D graphics, and gaming.

What is a neon processor?

Arm Neon is an advanced single instruction multiple data (SIMD) architecture extension for the Arm Cortex-A and Arm Cortex-R series of processors with capabilities that vastly improve use cases on mobile devices, such as multimedia encoding/decoding, user interface, 2D/3D graphics and gaming.

What is Neon instruction set?

The NEON instructions provide data processing and load/store operations only, and are integrated into the ARM and Thumb instruction sets. Standard ARM and Thumb instructions manage all program flow control.

What is neon optimization?

NEON is a set of single instruction, multiple data (SIMD) instructions for ARM, and it can help in performance optimization.


1 Answers

So this question is four years old, now, and still shows up in search results...

In 2016 things are much better.

A lot of simple code that I've transcribed from assembly to intrinsics is now optimised better by the compilers than by me because I'm too lazy to do the pipeline work (for how many different pipelines now?), while the compilers just needs me to pass the right --mtune=.

For complex code where register allocation can get tight, GCC and Clang both can still produce slower than handwritten code by a factor of two... or three(ish). It's mostly on register spills, so you should know from the structure of your code whether that's a risk.

But they both sometimes have disappointing accidents. I'd say that right now that's worth the risk (although I'm paid to take risk), and if you do get hit by something then file a bug. That way things will keep on getting better.

like image 90
sh1 Avatar answered Oct 06 '22 09:10

sh1