Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is the difference between the ARM, Thumb and Thumb 2 instruction encodings?

Tags:

arm

thumb

People also ask

What is the difference between Thumb and Thumb 2 instruction set?

Thumb-2 is an enhancement to the 16-bit Thumb instruction set. It adds 32-bit instructions that can be freely intermixed with 16-bit instructions in a program. The additional 32-bit instructions enable Thumb-2 to cover the functionality of the ARM instruction set.

What is the difference between the ARM and Thumb instruction format?

ARM instructions are 32 bits wide. Thumb instructions are 16 or 32-bits wide. The ARM instruction set is a set of 32-bit instructions providing a comprehensive range of operations. ARMv4T and later define a 16-bit instruction set called Thumb.

What is a Thumb instruction in ARM?

The Thumb instruction set is a subset of the most commonly used 32-bit ARM instructions. Thumb instructions are each 16 bits long, and have a corresponding 32-bit ARM instruction that has the same effect on the processor model.

Which are the different types of ARM instructions?

They are move, arithmetic, logical, comparison and multiply instructions and multiply instructions.


Oh, ARM and their silly naming...

It's a common misconception, but officially there's no such thing as a "Thumb-2 instruction set".

Ignoring ARMv8 (where everything is renamed and AArch64 complicates things), from ARMv4T to ARMv7-A there are two instruction sets: ARM and Thumb. They are both "32-bit" in the sense that they operate on up-to-32-bit-wide data in 32-bit-wide registers with 32-bit addresses. In fact, where they overlap they represent the exact same instructions - it is only the instruction encoding which differs, and the CPU effectively just has two different decode front-ends to its pipeline which it can switch between. For clarity, I shall now deliberately avoid the terms "32-bit" and "16-bit"...

ARM instructions have fixed-width 4-byte encodings which require 4-byte alignment. Thumb instructions have variable-length (2 or 4-byte, now known as "narrow" and "wide") encodings requiring 2-byte alignment - most instructions have 2-byte encodings, but bl and blx have always had 4-byte encodings*. The really confusing bit came in ARMv6T2, which introduced "Thumb-2 Technology". Thumb-2 encompassed not just adding a load more instructions to Thumb (mostly with 4-byte encodings) to bring it almost to parity with ARM, but also extending the execution state to allow for conditional execution of most Thumb instructions, and finally introducing a whole new assembly syntax (UAL, "Unified Assembly Language") which replaced the previous separate ARM and Thumb syntaxes and allowed writing code once and assembling it to either instruction set without modification.

The Cortex-M architectures only implement the Thumb instruction set - ARMv7-M (Cortex-M3/M4/M7) supports most of "Thumb-2 Technology", including conditional execution and encodings for VFP instructions, whereas ARMv6-M (Cortex-M0/M0+) only uses Thumb-2 in the form of a handful of 4-byte system instructions.

Thus, the new 4-byte encodings (and those added later in ARMv7 revisions) are still Thumb instructions - the "Thumb-2" aspect of them is that they can have 4-byte encodings, and that they can (mostly) be conditionally executed via it (and, I suppose, that their menmonics are only defined in UAL).

* Before ARMv6T2, it was actually a complicated implementation detail as to whether bl (or blx) was executed as a 4-byte instruction or as a pair of 2-byte instructions. The architectural definition was the latter, but since they could only ever be executed as a pair in sequence there was little to lose (other than the ability to take an interrupt halfway through) by fusing them into a single instruction for performance reasons. ARMv6T2 just redefined things in terms of the fused single-instruction execution


In addition to Notlikethat's answer, and as it hints at, ARMv8 introduces some new terminology to try to reduce the confusion (of course adding even more new terminology):

There is a 32-bit execution state (AArch32) and a 64-bit execution state (AArch64).

The 32-bit execution state supports two different instruction sets: T32 ("Thumb") and A32 ("ARM"). The 64-bit execution state supports only one instruction set - A64.

All A64, like all A32, instructions are 32-bit (4 byte) in size, requiring 4-byte alignment.

Many/most A64 instructions can operate on both 32-bit and 64-bit registers (or arguably 32-bit or 64-bit views of the same underlying 64-bit register).

All ARMv8 processors (like all ARMv7 processors) that implement AArch32 support Thumb-2 instructions in the T32 instruction set.

Not all ARMv8-A processors implement AAarch32, and some don't implement AArch64. Some Processors support both, but only support AArch32 at lower exception levels.