Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

ISB instruction in ARM Cortex M

Untill now I used 3 NOPs in order to "clean" the pipeline. Recently I encountered the ISB instruction that does that for me. Viewing the arm info center I noticed that this command takes 4 cycles (Under Cortex M0) and the 3 NOPs takes only 3.

Why should I use this command? What is it different from the 3 NOPs?

like image 721
DrorNohi Avatar asked Feb 04 '26 00:02

DrorNohi


2 Answers

Here's the problem with NOP (http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dui0552a/CHDJJGFB.html):

Operation

NOP performs no operation and is not guaranteed to be time consuming. The processor might remove it from the pipeline before it reaches the execution stage.

Use NOP for padding, for example to place the subsequent instructions on a 64-bit boundary.

The same info is in documentation for other ARM Cortex devices, so using that instruction for any purpose other than padding is not reliable at all. The only guarantee you have is that this instruction will occupy 2 (nop) or 4 bytes (nop.w) and that it will not perform any operation - nothing more.

like image 143
Freddie Chopin Avatar answered Feb 05 '26 15:02

Freddie Chopin


The reason that ISB instruction is 4 cycles is very simple. Cortex-M instruction set is a mixture of 16-bit and 32-bit instructions. There are six 32-bit instructions that are supported in Cortex-M designs (e.g. Cortex-M0) : BL, MSR, MRS, ISB, DMB, DSB.

All these six instructions can be mixed among 16-bit instructions.

The question is how the processor knows which instruction is 16-bit and which one is 32-bits ? To answer this question the processor reads the first 16-bits and decodes it (1 cycle). if the opcode matches a 32-bit instructions then it knows that the next 16-bit instruction is actually the second half of a 32-bit instruction and tries to execute it (3 cycles).

That makes ALL 32-bit instructions in Cortex-M cores to be 1+3 cycles = 4 cycles.

To flush the pipeline you can use 3 NOPs if you are sure about the core implementation. You must be sure that the core does not have a branch prediction and on the fly instruction optimization which removes consecutive NOPs. If you are sure about the absense of this feature then use 3 NOP instructions and you will save 1 cycle. But if you are not use and you also want your ARM code to be portable to other architectures like ARMv7, etc. Then you must use ISB instruction, which is a 32-bit instruction and takes 4 cycles.

like image 32
Ehsan Avatar answered Feb 05 '26 17:02

Ehsan