Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

C++ Tips for code optimization on ARM devices

I have been developing C++ code for augmented reality on ARM devices and optimization of the code is very important in order to keep a good frame rate. In order to rise efficiency to the maximum level I think it is important to gather general tips that make life easier for compilers and reduce the number of cicles of the program. Any suggestion is welcomed.

1- Avoid high-cost instructions: division, square root, sin, cos

  • Use logical shifts to divide or multiply by 2.
  • Multiply by the inverse when possible.

2- Optimize inner "for" loops: they are a botleneck so we should avoid making many calculations inside, especially divisions, square roots..

3- Use look-up tables for some mathematical functions (sin, cos, ...)

USEFUL TOOLS

  • objdump: gets assembly code of compiled program. This allows to compare two functions and check if it is really optimized.
like image 591
Jav_Rock Avatar asked May 29 '12 13:05

Jav_Rock


People also ask

Does arm use C?

The ARM architecture, like most 32-bit architectures, is well-suited to a using a C or C++ compiler. The majority of control code is written using high-level programming languages like C and C++ instead of assembly language.

What is code optimization and its techniques?

The code optimization in the synthesis phase is a program transformation technique, which tries to improve the intermediate code by making it consume fewer resources (i.e. CPU, Memory) so that faster-running machine code will result.


1 Answers

To answer your question about general rules when optimizing C++ code for ARM, here are a few suggestions:

1) As you mentioned, there is no divide instruction. Use logical shifts or multiply by the inverse when possible.
2) Memory is much slower than CPU execution; use logical operations to avoid small lookup tables.
3) Try to write 32-bits at a time to make best use of the write buffer. Writing shorts or chars will slow the code down considerably. In other words, it's faster to logical-OR the smaller bits together and write them as DWORDS.
4) Be aware of your L1/L2 cache size. As a general rule, ARM chips have much smaller caches than Intel.
5) Use SIMD (NEON) when possible. NEON instructions are quite powerful and for "vectorizable" code, can be quite fast. NEON intrinsics are available in most C++ environments and can be nearly as fast as writing hand tuned ASM code.
6) Use the cache prefetch hint (PLD) to speed up looping reads. ARM doesn't have smart precache logic the way that modern Intel chips do.
7) Don't trust the compiler to generate good code. Look at the ASM output and rewrite hotspots in ASM. For bit/byte manipulation, the C language can't specify things as efficiently as they can be accomplished in ASM. ARM has powerful 3-operand instructions, multi-load/store and "free" shifts that can outperform what the compiler is capable of generating.

like image 149
BitBank Avatar answered Oct 19 '22 05:10

BitBank