I'm running a Core i7 3930k, which is of the Sandy Bridge microarchitecture. When executing the following code (compiled under MSVC19, VS2015), the results surprised me (see in comments):
int wmain(int argc, wchar_t* argv[])
{
uint64_t r = 0b1110'0000'0000'0000ULL;
uint64_t tzcnt = _tzcnt_u64(r);
cout << tzcnt << endl; // prints 13
int info[4]{};
__cpuidex(info, 7, 0);
int ebx = info[1];
cout << bitset<32>(ebx) << endl; // prints 32 zeros (including the bmi1 bit)
return 0;
}
Disassembly shows that the tzcnt
instruction is indeed emitted from the intrinsic:
uint64_t r = 0b1110'0000'0000'0000ULL;
00007FF64B44877F 48 C7 45 08 00 E0 00 00 mov qword ptr [r],0E000h
uint64_t tzcnt = _tzcnt_u64(r);
00007FF64B448787 F3 48 0F BC 45 08 tzcnt rax,qword ptr [r]
00007FF64B44878D 48 89 45 28 mov qword ptr [tzcnt],rax
How come I'm not getting an #UD
invalid opcode exception, the instruction functions correctly, and the CPU reports that it does not support the aforementioned instruction?
Could this be some weird microcode revision that contains an implementation for the instruction but doesn't report support for it (and others included in bmi1
)?
I haven't checked the rest of the bmi1
instructions, but I'm wondering how common a phenomenon this is.
Sandy Bridge is the codename of a microarchitecture for microprocessors developed by Intel as Westmere and Nahalem’s successor. Also called “second generation” Intel first showed a Sandy Bridge processor back in 2009 and launched the first Sandy Bridge-based processor to the market in 2011.
The core labeling in Sandy Bridge is contiguous. That is, cores 0-7 are in the first socket and cores 8-15 are in the second socket. When using the HPE MPT library, the environment variable MPI_DSM_DISTRIBUTE is set to ON by default for the Sandy Bridge nodes.
Drivers for Intel HD 3000 will be in-box divers that will be included in the Windows 10 distribution media. Intel HD 3000 supports only WDDM 1.2 features. It is not capable of supporting WDDM 2.0. 03-16-2015 07:39 AM Nice, it is good news to hear that Intel HD Graphics 3000 (Sandy Bridge) will be supported officially in Windows 10.
Sandy Bridge marked a turning point in performance on x86 which took AMD another eight years to catch up with its Ryzen series. As tests show, its great popularity was no coincidence, with people still happily using Sandy Bridge processors even today.
The reason that Sandy Bridge (and earlier) processors seem to support lzcnt
and tzcnt
is that both instructions have a backward compatible encoding.
lzcnt eax,eax = rep bsr eax,eax
tzcnt eax,eax = rep bsf eax,eax
On older processors the rep
prefix is simply ignored.
So much for the good news.
The bad news is that the semantics of both versions are different.
lzcnt eax,zero => eax = 32, CF=1, ZF=0
bsr eax,zero => eax = undefined, ZF=1
lzcnt eax,0xFFFFFFFF => eax=0, CF=0, ZF=1 //dest=number of msb leading zeros
bsr eax,0xFFFFFFFF => eax=31, ZF=0 //dest = bit index of highest set bit
tzcnt eax,zero => eax = 32, CF=1, ZF=0
bsf eax,zero => eax = undefined, ZF=1
tzcnt eax,0xFFFFFFFF => eax=0, CF=0, ZF=1 //dest=number of lsb trailing zeros
bsf eax,0xFFFFFFFF => eax=0, ZF=0 //dest = bit index of lowest set bit
At least bsf
and tzcnt
generate the same output when source <> 0. bsr
and lzcnt
do not agree on that.
Also lzcnt
and tzcnt
execute much faster than bsr
/bsf
.
It totally sucks that bsf
and tzcnt
cannot agree on the flag usage.
This needless inconsistancy means that I cannot use tzcnt
as a drop-in replacement for bsf
unless I can be sure its source is non-zero.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With