Are there any still-relevant CPUs (Intel/AMD/Atom) which don't support SSSE3 instructions?
What's the most recent CPU without SSSE3?
Processor Support Supplemental SSE3 (SSSE3) is supported by Intel Core 2 Duo, Intel Core i7/i5/i3, Intel Atom, AMD Bulldozer, and later processors.
In April 2005, AMD introduced a subset of SSE3 in revision E (Venice and San Diego) of their Athlon 64 CPUs. The earlier SIMD instruction sets on the x86 platform, from oldest to newest, are MMX, 3DNow! (developed by AMD, but not supported by Intel processors), SSE, and SSE2.
Intel has had SSE 4.2 for about 12 years now as it was introduced in their Nehalem architecture, AMD has had it for about 9 years. Basically every Intel Core i3, i5, i7, and AMD FX and Ryzen CPU supports this as well. The i7-10700 requires an LGA1200 motherboard, I would recommend an H470 motherboard for your use case.
SSE2 (Streaming SIMD Extensions 2) is one of the Intel SIMD (Single Instruction, Multiple Data) processor supplementary instruction sets first introduced by Intel with the initial version of the Pentium 4 in 2000. It extends the earlier SSE instruction set, and is intended to fully replace MMX.
The most recent CPUs without SSSE3 are based on the AMD K10 microarchitecture:
K10 CPUs support SSE3 (FP instructions like movddup
and haddps
), and AMD-only SSE4a. Some early K8 cores only have SSE2, but later K8 also had SSE3.
Notice that AMD CPUs listed in https://en.wikipedia.org/wiki/SSSE3#CPUs_with_SSSE3 only start at Bulldozer, but do include AMD's low-power Bobcat / Jaguar CPUs.
If you google AMD Phenom II ssse3
, you'll find some pages about some games removing an SSSE3 requirement so they can work on Phenom II.
On Intel you have to go back as far as Pentium M / Core, because SSSE3 was introduced with Core 2. (First-gen core2 (Conroe/Merom) only has 64-bit wide shuffle execution units, so pshufb
is relatively slow. But so is SSE2 pshufd
. See Fastest way to do horizontal float vector sum on x86.)
I think even first-gen Atom has SSSE3. https://en.wikipedia.org/wiki/Intel_Atom.
There are CPUs like AMD Geode that don't have SSE at all, but I think the point of the question is CPUs that do have SSE2/3 but not SSSE3.
There are no new mainstream CPUs being made that don't have SSE4.2, but some Phenom II CPUs are probably still in use even in 2018. The older they are, the more it's expected that new software might not work on them.
There are unfortunately still brand-new mainstream CPUs being made without AVX and BMI: Intel's Pentium and Celeron models, even for Skylake / Kaby Lake. Presumably when a die has defects in the upper 128-bits of its vector ALUs, e.g. the large FMA units, they fuse it off and disable decoding of VEX prefixes, and label it as a Pentium or Celeron1. (This is presumably why Pentium/Celeron models don't support BMI1/BMI2 either; other than pext
/pdep
those take trivial die area.)
So we're not getting any closer to BMI1/BMI2 being baseline at some point in the future, which is really unfortunate because it's required for single-uop variable-count shifts on Intel CPUs. (shl cl,reg
is 3 uops because of the cl=0 no-flag-update case being possible; SHLX / SHRX are 1 uop). BMI1/2 is most useful when used throughout your whole code, not just in a couple functions.
Footnote 1: Certainly some fully-working chips get this treatment, too, especially once yields improve for a new process, but for consistency / market-segmentation they're still crippled.
But I think rep movs/rep stos
ERMSB still work with 256-bit loads/stores, so the FP register file, load/store units, and bypass forwarding network would all still need to support full width. (And ERMSB becomes much more attractive vs. vector loops because it can use twice the width.
I wonder if there's a way for the CPU to be rewired with fuses so it can use any 2 of the 4 128-bit lanes of FMA units that are working. We know Skylake-AVX512 can mix and match FMA units with ports 0, 1, and 5, only powering up the p5 FMA (if available) for 512-bit vectors, and combining the 256-bit FMA units on p0 and p1 as one 512-bit FMA unit. Statically doing something like that with fuses could let Intel use chips that had a defect affecting both lanes of what would have been one FMA unit.
Anyway, this is pure guesswork. It's likely, but don't know if we have any reliable source that Intel actually ever did this as a way to sell chips with FMA defects. We do know that chips with defects in a whole physical core get sold as lower core-count SKUs, like a dual-core chip from a quad-core die. And that quad-core i5 CPUs with only 6MB of L3 cache instead of 8MB means they have one of their 4 slices of L3 cache disabled, again probably for salvaging defects.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With