Do any CPU architectures use Metadata?

Question

I've recently been looking into a concept for a CPU architecture called the Mill.

The Mill (though it may be vaporware) uses metadata for various things in the CPU, such as a software speculative load producing a value tagged as not a result (NaR). If a later instruction tries to store that result non-speculatively, hardware detects that and faults.

I was wondering if any other CPU's are similar in the sense of using metadata in the architecture.

Peter Cordes · Accepted Answer

A few random examples I know of, certainly not an exhaustive list. IDK if there are any that use metadata for all the things the Mill does. Some of what the Mill does is unique, but some of the ideas have appeared in similar forms in other ISAs.

Yes, IA-64 Itanium also had not-a-thing load results that would fault if you read them, for the same software-speculation reason as the Mill. Its architects described it as an EPIC ISA. (EPIC = Explicitly Parallel Instruction Computing, as opposed to CISC or RISC. It's also a VLIW.) From Wikipedia:

The architecture implements a large number of registers:

128 general integer registers, which are 64-bit plus one trap bit ("NaT", which stands for "not a thing") used for speculative execution. 32 of these are static, the other 96 are stacked using variably-sized register windows, or rotating for pipelined loops. gr0 always reads 0.

128 floating point registers. The floating point registers are 82 bits long to preserve precision for intermediate results. Instead of a dedicated "NaT" trap bit like the integer registers, floating point registers have a trap value called "NaTVal" ("Not a Thing Value"), similar to (but distinct from) NaN. These also have 32 static registers and 96 windowed or rotating registers. fr0 always reads +0.0, and fr1 always reads +1.0.

So for integer, there truly is separate metadata. For FP, the metadata is encoded in-band.

Other examples of metadata that aren't related to software-visible speculation include:

The x87 FPU has 8 architectural registers, but normal instructions access them as a register stack where the underlying register for st(0) is determined by a field in the x87 status word. (i.e. the metadata is architecturally visible and can be modified with fincstp to rotate the "revolver barrel".) See http://www.ray.masmcode.com/tutorial/fpuchap1.htm for a good diagram and intro to the x87 design. Also, x87 has a free / in-use flag for each register; trying to load into an already in use register produces an FP exception (and a NaN if exceptions are masked). Normally the in-use flag is cleared by "popping" the register stack with fstp to store and pop, or whatever, but there's also ffree to mark any x87 register as free.

Microarchitectural (performance effects only):

Obviously a microarchitecture has to keep lots of info about instructions that are in flight, like whether they've finished executing or not. But there is at least one interesting case of metadata about data, not code:

In AMD Bulldozer-family and Bobcat/Jaguar, the SIMD FPUs apparently keep some extra metadata alongside the actual architectural register value. As Agner Fog explains in his microarchitecture PDF, (Bulldozer-family) 19.11 Data delay between different execution domains:

There is a large penalty when the output of a floating point calculation is input to a floating point calculation with a different precision, for example if the output of a double precision floating point addition is input to a single precision addition. This has hardly any practical significance since such a sequence is most likely to be a programming error, but it indicates that the processor stores extra information about floating point numbers beyond the 128 bits in an XMM register. This effect is not seen on Intel processors.

This might possibly be related to the fact that Bulldozer has FP latency 1 cycle lower when forwarding from an FMA-unit instruction to another FMA instruction, like mulps forwarding to addps with no sqrtps or xorps in between.

Also various AMD uarches have marked instruction boundaries in L1 I-cache, reducing / latency of decoding repeatedly. Intel Silvermont also does this.

John D McCalpin · Answer

There have been many tagged architectures, most primarily research projects. Some have minimal tagging, such as the Tera MTA, which supported four extra bits per 64-bit word -- a "full/empty" bit, an "indirection" bit, and two "trap" bits. The "full/empty" bit was the most important, allowing more efficient producer-consumer transactions than in cached systems.

The most advanced tagged architecture that I have seen is still under development. It was originally developed by BAE systems under DARPA funding, with many important and interesting developments/papers under the "CRASH-SAFE" project:

https://web.archive.org/web/20191022221212/http://www.crash-safe.org/

This is a "tag rich" approach, with 64 tag bits for every 64 data bits, and the ability to use the 64 tag bits as a pointer to an arbitrarily large tag structure if needed. The approach provides almost completely independent pipelines for "data" and "tag", with the "security pipeline" combining the tags from the input data with the input instruction type to determine whether the execution is valid, and if valid, what the output tag should be.

After the end of the Crash-Safe project (~2015), Draper Industries picked up the project, turning it into the "Dover Inherently Secure Processor" (https://www.draper.com/explore-solutions/inherently-secure-processor). In 2017, Draper spun off https://www.dovermicrosystems.com/, where the technology continues to be developed for a variety of architectures under the name "CoreGuard". Two projects that interest me are:

RISC-V (https://riscv.org/wp-content/uploads/2016/01/Wed1430-dover_riscv_jan2016_v3.pdf)
Tensilica (https://www.cadence.com/en_US/home/company/newsroom/press-releases/pr/2020/dover-microsystems-and-cadence-partner-to-deliver-secure-process.html)

The first is interesting because it pushes the ideas into the RISC-V open source processor community, and the second because of both the support of an industry heavyweight (Cadence) and the link with the (under-appreciated) Tensilica architecture.

The motivation behind this approach is security, and once you realize how much this can "fix" in the security world it becomes difficult to understand why you would even try to make computers any other way. But it is not limited to security -- tags can be extremely useful for very advanced approaches to data typing, such as "dimensional analysis" or separate index variable types for different dimensions of multi-dimensional arrays.

Do any CPU architectures use Metadata?

Tags:

cpu-architecture

Nebuans

2 Answers

Microarchitectural (performance effects only):

Peter Cordes

John D McCalpin

Recent Activity

Donate For Us

Do any CPU architectures use Metadata?

Tags:

cpu-architecture

Nebuans

2 Answers

Microarchitectural (performance effects only):

Peter Cordes

John D McCalpin

Related questions

Recent Activity

Donate For Us