Is the following code valid to check if a CPU supports the SSE3 instruction set?
Using the IsProcessorFeaturePresent()
function apparently does not work on Windows XP.
bool CheckSSE3() { int CPUInfo[4] = {-1}; //-- Get number of valid info ids __cpuid(CPUInfo, 0); int nIds = CPUInfo[0]; //-- Get info for id "1" if (nIds >= 1) { __cpuid(CPUInfo, 1); bool bSSE3NewInstructions = (CPUInfo[2] & 0x1) || false; return bSSE3NewInstructions; } return false; }
SSE3, Streaming SIMD Extensions 3, also known by its Intel code name Prescott New Instructions (PNI), is the third iteration of the SSE instruction set for the IA-32 (x86) architecture. Intel introduced SSE3 in early 2004 with the Prescott revision of their Pentium 4 CPU.
Supplemental SSE3 (SSSE3) is supported by Intel Core 2 Duo, Intel Core i7/i5/i3, Intel Atom, AMD Bulldozer, AMD Bobcat, and later processors. SSE4. 1 is supported on Intel Core 2 (“Penryn”), Intel Core i7 (“Nehalem”), Intel Atom (Silvermont core), AMD Bulldozer, AMD Jaguar, and later processors.
If you are unsure about your particular computer, you can determine SSE2 support by: Windows: A free download, CPU-Z, is available from CPUID that will indicate if SSE2 is present on your system or not. Linux: From a terminal, run “cat /proc/cpuinfo”. “sse2” will be listed as one of the “flags” if SSE2 is available.
If you have anything newer, AVX-512 won't work. You can check this by running hardware monitoring apps such as HWInfo64, which will tell you what microcode version your system is running at (HWInfo64 lists it as MCU). Next is BIOS compatibility, which can range significantly depending on the vendor.
I've created a GitHub repro that will detect CPU and OS support for all the major x86 ISA extensions: https://github.com/Mysticial/FeatureDetector
Here's a shorter version:
First you need to access the CPUID instruction:
#ifdef _WIN32 // Windows #define cpuid(info, x) __cpuidex(info, x, 0) #else // GCC Intrinsics #include <cpuid.h> void cpuid(int info[4], int InfoType){ __cpuid_count(InfoType, 0, info[0], info[1], info[2], info[3]); } #endif
Then you can run the following code:
// Misc. bool HW_MMX; bool HW_x64; bool HW_ABM; // Advanced Bit Manipulation bool HW_RDRAND; bool HW_BMI1; bool HW_BMI2; bool HW_ADX; bool HW_PREFETCHWT1; // SIMD: 128-bit bool HW_SSE; bool HW_SSE2; bool HW_SSE3; bool HW_SSSE3; bool HW_SSE41; bool HW_SSE42; bool HW_SSE4a; bool HW_AES; bool HW_SHA; // SIMD: 256-bit bool HW_AVX; bool HW_XOP; bool HW_FMA3; bool HW_FMA4; bool HW_AVX2; // SIMD: 512-bit bool HW_AVX512F; // AVX512 Foundation bool HW_AVX512CD; // AVX512 Conflict Detection bool HW_AVX512PF; // AVX512 Prefetch bool HW_AVX512ER; // AVX512 Exponential + Reciprocal bool HW_AVX512VL; // AVX512 Vector Length Extensions bool HW_AVX512BW; // AVX512 Byte + Word bool HW_AVX512DQ; // AVX512 Doubleword + Quadword bool HW_AVX512IFMA; // AVX512 Integer 52-bit Fused Multiply-Add bool HW_AVX512VBMI; // AVX512 Vector Byte Manipulation Instructions int info[4]; cpuid(info, 0); int nIds = info[0]; cpuid(info, 0x80000000); unsigned nExIds = info[0]; // Detect Features if (nIds >= 0x00000001){ cpuid(info,0x00000001); HW_MMX = (info[3] & ((int)1 << 23)) != 0; HW_SSE = (info[3] & ((int)1 << 25)) != 0; HW_SSE2 = (info[3] & ((int)1 << 26)) != 0; HW_SSE3 = (info[2] & ((int)1 << 0)) != 0; HW_SSSE3 = (info[2] & ((int)1 << 9)) != 0; HW_SSE41 = (info[2] & ((int)1 << 19)) != 0; HW_SSE42 = (info[2] & ((int)1 << 20)) != 0; HW_AES = (info[2] & ((int)1 << 25)) != 0; HW_AVX = (info[2] & ((int)1 << 28)) != 0; HW_FMA3 = (info[2] & ((int)1 << 12)) != 0; HW_RDRAND = (info[2] & ((int)1 << 30)) != 0; } if (nIds >= 0x00000007){ cpuid(info,0x00000007); HW_AVX2 = (info[1] & ((int)1 << 5)) != 0; HW_BMI1 = (info[1] & ((int)1 << 3)) != 0; HW_BMI2 = (info[1] & ((int)1 << 8)) != 0; HW_ADX = (info[1] & ((int)1 << 19)) != 0; HW_SHA = (info[1] & ((int)1 << 29)) != 0; HW_PREFETCHWT1 = (info[2] & ((int)1 << 0)) != 0; HW_AVX512F = (info[1] & ((int)1 << 16)) != 0; HW_AVX512CD = (info[1] & ((int)1 << 28)) != 0; HW_AVX512PF = (info[1] & ((int)1 << 26)) != 0; HW_AVX512ER = (info[1] & ((int)1 << 27)) != 0; HW_AVX512VL = (info[1] & ((int)1 << 31)) != 0; HW_AVX512BW = (info[1] & ((int)1 << 30)) != 0; HW_AVX512DQ = (info[1] & ((int)1 << 17)) != 0; HW_AVX512IFMA = (info[1] & ((int)1 << 21)) != 0; HW_AVX512VBMI = (info[2] & ((int)1 << 1)) != 0; } if (nExIds >= 0x80000001){ cpuid(info,0x80000001); HW_x64 = (info[3] & ((int)1 << 29)) != 0; HW_ABM = (info[2] & ((int)1 << 5)) != 0; HW_SSE4a = (info[2] & ((int)1 << 6)) != 0; HW_FMA4 = (info[2] & ((int)1 << 16)) != 0; HW_XOP = (info[2] & ((int)1 << 11)) != 0; }
Note that this only detects whether the CPU supports the instructions. To actually run them, you also need to have operating system support.
Specifically, operating system support is required for:
ymm
registers. See Andy Lutomirski's answer for how to detect this.zmm
and mask registers. Detecting OS support for AVX512 is the same as with AVX, but using the flag 0xe6
instead of 0x6
.Mysticial's answer is a bit dangerous -- it explains how to detect CPU support but not OS support. You need to use _xgetbv to check whether the OS has enabled the required CPU extended state. See here for another source. Even gcc has made the same mistake. The meat of the code is:
bool avxSupported = false; int cpuInfo[4]; __cpuid(cpuInfo, 1); bool osUsesXSAVE_XRSTORE = cpuInfo[2] & (1 << 27) || false; bool cpuAVXSuport = cpuInfo[2] & (1 << 28) || false; if (osUsesXSAVE_XRSTORE && cpuAVXSuport) { unsigned long long xcrFeatureMask = _xgetbv(_XCR_XFEATURE_ENABLED_MASK); avxSupported = (xcrFeatureMask & 0x6) == 0x6; }
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With