Aarch64 what is late-forwarding?

Question

"Late-forwarding" is mentioned in "Arm Neoverse E1 Core Software Optimization Guide" (as well as in their optimization guides for some other CPU models):

Instruction Group	Instructions	Exec Latency	Exec Throughput	Notes
Multiply accumulate (32-bit)	MADD, MSUB	3 (2)	1	2
Multiply accumulate (64-bit)	MADD, MSUB	5 (4)	1/3	2

(2) Multiply-accumulate pipelines support late-forwarding of accumulate operands from similar μOPs, allowing a typical sequence of multiply-accumulate μOPs to issue one every N cycles (accumulate latency N shown in parentheses).

What does the term "late-forwarding" mean? What sequence of instructions would be subject to late-forwarding (counter-example would also be helpful)?

Paul A. Clayton · Accepted Answer

Late forwarding for multiply-add operations means that the addend can be made available after the multiplication has completed rather than having to be available when the multiply-add operation begins execution. Since the multiplication itself is not data dependent on the addend, it can proceed. Since some work for the addition can be done in parallel with the multiplication (the exponent of the product will be available early and can be used with the addend's exponent to determine the amount of shift needed before addition), one may want the addend to be available before the entire product is available, but even in that case the addend is not needed until much later than the multiplicands.

By delaying the forwarding (availability) of the addend, the effective latency of dependent accumulations is reduced. This reduces the number of accumulation registers (and parallelism) one needs to cover the latency.

Aarch64 what is late-forwarding?

Tags:

cpu-architecture

assembly

arm64

stepan

1 Answers

Paul A. Clayton

Recent Activity

Donate For Us

Aarch64 what is late-forwarding?

Tags:

cpu-architecture

assembly

arm64

stepan

1 Answers

Paul A. Clayton

Related questions

Recent Activity

Donate For Us