During an interview I was asked if I knew x64 instructions that behave differently depending on the CPU used, I couldn't find any documentation on that anywhere, does anyone know what these instructions are and why this is the case?
There are some that leave a register or some flags with undefined values. Intel and AMD may differ there.
In some cases, the actual behaviour of real hardware for these undefined cases preserves backwards compatibility for some old software that relies on it. For example, BSF
with input=0 sets ZF and leaves the destination register unmodified. (On both current Intel and AMD hardware. IDK if any old Intel hardware was ever different, if no, bsf
/bsr
isn't really an example of an instruction that executes differently, just a lack of documented guarantees of being future-proof.)
But the difference is that Intel documents it as leaving the destination register with "undefined" contents. AMD's manuals explicitly document and guarantee that AMD CPUs will leave the destination unmodified in that case.
AMD's AMD64 manual (March 2017) for
bsr
/bsf
:
If the second operand contains 0, the instruction sets ZF to 1 and does not change the contents of the destination register
So it's not guaranteed on paper that it's safe to emulate tzcnt / implement std::countr_zero
as mov eax, 32
/ bsf eax, edx
, even though that works in practice and will likely continue working on future CPUs. (This is why bsf
/ bsr
have an output dependency.) Intel might eventually document this behaviour, in which case compilers will be able to use it for a more efficient countr_zero
/ countl_zero
without BMI1. Intel did recently document that AVX implied 16-byte aligned loads / stores were atomic on Intel CPUs, so it's not unprecedented for a vendor to document something that their CPUs have been doing for years.
If performance differences count, there are many (see links in the x86 tag wiki)!
You're not just talking about unsupported instructions, are you? Like LAHF/SAHF being unsupported in long mode on some very early x86-64 CPUs? Or CMPXCHG16B also unsupported on early K8.
The most interesting case of unsupported instructions is that LZCNT decodes as BSR on CPUs that don't support it, the REP prefix being ignored. Even for non-zero inputs, they return opposite results. (_lzcnt_u32(x) == 31-bsr(x)
). TZCNT similarly decodes as (REP) BSF on CPUs that don't support it, but they do the same thing except when input = 0. I didn't mention this earlier, because running the same machine-code differently is not the same thing as running the same instruction differently, but it sounds like this is the kind of thing you're asking for.
Are we talking only about un-privileged instructions? There are probably many more differences in the behaviour of privileged instructions. For example, Intel and AMD both have different bugs in SYSRET that Linux has to work around to avoid malicious user-space being able to cause a kernel hang.
Another case that I'm not sure counts: PREFETCHW runs on Intel CPUs from at least Core2 to Haswell as a NOP, but on AMD CPUs (and Intel since Broadwell) as an actual prefetch.
So some CPUs run it as a NOP, some run it as a prefetch (thus no architecturally visible effect either way), except on ancient CPUs where it faults as an illegal insn. 64-bit Windows8.1 apparently requires that PREFETCHW can run without faulting (which stops it from running on (some?) 64-bit Pentium4 CPUs).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With