Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What EXACTLY is the difference between intel's and amd's ISA, if any?

I know people have asked similar questions like this before, however there is so much conflicting information that I really want to try and clear it up once and for all. I will attempt to do so by clearly distinguishing between instruction set architecture (ISA) and actual hardware implementation. First my attempted clarifications:

1.) Currently there are intel64 and amd64 CPU's out there (among others but these are the focus)

2.) Given that an ISA is the binary representation of 1 or more CPU instructions this means an ISA is completely separate from it's actual hardware implementation.

My question(s):

Does the differences between intel 64 and amd64 CPUs have to do with different or extended x86-64 ISAs? Or different hardware implementations of the x86-64 ISA? Or both?

like image 764
Jason Avatar asked Jul 22 '16 01:07

Jason


1 Answers

Yes, the ISA is a document / specification, not hardware. Implementing all of it correctly is what makes something an x86 CPU, rather than just something with similarities to x86.

See the x86 tag wiki for links to the official docs (Intel's manuals).

Intel and AMD's implementations of the x86 ISA differ mainly in performance, and in which extensions to the instruction-set they support. Software can query what's supported using the CPUID instruction.

There are also non-performance differences, like occasional minor differences in semantics of instructions, especially privileged instructions that OSes need to use:

  • What is the compatible subset of Intel's and AMD's x86-64 implementations?
  • https://en.wikipedia.org/wiki/X86-64#Differences_between_AMD64_and_Intel_64

One of the major divergences here is that Intel, AMD, and VIA each have their own hardware-virtualization extensions which don't even try to be compatible. So a VM like Xen needs separate "drivers" or "backend" code for each of these extensions. But those are still extensions, not part of baseline x86.

SIMD extensions for use by user-space programs end up being available on both, often with a delay thanks to Intel's efforts to screw over AMD with anti-competitive practices. This costs everyone else's time, and is often detrimental to the overall x86 ecosystem (e.g. SSSE3 could have been assumed as a baseline for more software by now), but helps Intel's bottom line. A good example here: AMD Bulldozer supports FMA4, but Intel changed their mind at the last minute and implemented FMA3 in Haswell. AMD didn't support that until their next microarch (Piledriver).


Given that an ISA is the binary representation of 1 or more CPU instructions.

No, an ISA is much more than that. Everything that Intel documents as being guaranteed across all x86 CPUs is part of the ISA. This isn't just the detailed behaviour of every instruction, but also stuff like which control register does what, and the memory ordering rules. Basically everything in the manuals published by Intel and AMD that isn't prefaced by "on such and such a specific model of CPU".

I expect there are a few cases where Intel's and AMD's system programming guides differ on how x86 should work. (And VIA's if they publish their own for their x86 CPUs). I haven't checked, but I'm pretty sure user-space doesn't suffer from this: If there are differences, they're limited to privileged instructions that only work if the kernel runs them. Anyway, in that case I guess you could say the x86 ISA is the common subset of what Intel and AMD document.


Note that experimenting to find what real hardware does in practice is useful for understanding the docs, but NOT a replacement for reading them. You don't want your code to rely on how an instruction happens to behave on the CPU you tested.

However, Intel does test their new designs with real software, because not being able to run existing versions of Windows would be a downside commercially. e.g. Windows9x doesn't invalidate a TLB entry that could only have been filled speculatively (all the rest of this example is just a summary of and extrapolation from that very detailed blog post). This was either a performance hack based on the assumption that it was safe (and was safe on hardware at the time), or an unnoticed bug. It couldn't have been detected by testing on hardware at the time.

Modern Intel CPUs do speculative pagewalks, but even as recently as Haswell detect and shoot-down mis-speculation so code that assumes this doesn't happen will still work.

This means the real hardware gives a stronger ordering guarantee than the ISA, which says:

The processor may cache translations required for prefetches and for accesses that are a result of speculative execution that would never actually occur in the executed code path.

Still, depending on this stronger behaviour would be a mistake, unless you only do it on known microarchitectures. AMD K8/K10 is like Intel, but Bulldozer-family speculates without any detect+rollback mechanism to give coherence, so that Win9x kernel code isn't safe on that hardware. And future Intel hardware might drop the detect+rollback mechanism, too.

TL:DR: all the uarches implement what the x86 ISA says, but some give stronger guarantees. If you're as big as Microsoft, Intel and AMD will design CPUs that reproduce the non-ISA-guaranteed behaviour that your code depends on. At least until that software is long-obsolete. There's no true guarantee that future Intel uarches will keep the rollback mechanism. If Intel ever does another redesign from the ground up, (like P4 / NetBurst instead of just building on their existing Sandybridge uarch family) that would when they could plausibly change something.


A different example: the bsf instruction with an input of zero leaves its output undefined, according to the paper spec in Intel's insn ref manual.

But for any specific CPU, there will be some pattern of behaviour, like setting the output to zero, or leaving it unchanged. On paper, it would be valid for an out-of-order-execution CPU to really give unpredictable results that were different for the same inputs, because of different microarchitectural state.

But the behaviour Intel chooses to implement in silicon is to always leave the destination unchanged when the bsf or bsr input is zero. AMD does the same, and even documents the behaviour. It's basically an unofficial guarantee that mov eax,32 / bsf eax, ebx will work exactly like tzcnt (except for flag setting, e.g. ZF based on the input being 0, rather than the output).

This is why popcnt / lzcnt / tzcnt have a false dependency on the output register in Intel CPUs!.

It's common for CPU vendors to go above and beyond the paper ISA spec to avoid breaking some existing code that depends on this behaviour (e.g. if that code is part of Windows, or other major pieces of software that Intel / AMD test on their new CPU designs).

As Andy Glew said in a comment thread about the coherent page walk thing mentioned above, and about self-modifying code:

It is pretty common that a particular implementation has to implement rules compatible with but stronger than the architectural statement. But not all implementations have to do it the same way.

like image 136
Peter Cordes Avatar answered Sep 27 '22 23:09

Peter Cordes