Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What are the technical reasons behind the "Itanium fiasco", if any? [closed]

In this article Jonh Dvorak calls Itanium "one of the great fiascos of the last 50 years". While he describes the over-optimistic market expectations and the dramatic financial outcome of the idea, he doesn't go into the technical details of this epic fail. I had my chance to work with Itanium for some period of time and I personally loved its architecture, it was so clear and simple and straightforward in comparison to the modern x86 processors architecture...

So then what are/were the technical reasons of its failure? Under-performance? Incompatibility with x86 code? Complexity of compilers? Why did this "Itanic" sink?

Itanium processor block

like image 965
Max Galkin Avatar asked Jun 18 '09 09:06

Max Galkin


People also ask

Why did Intel itanium fail?

Put simply, Itanium failed in part because Intel pushed a task into software that software compilers aren't capable of addressing all that effectively.

Why did VLIW fail?

Itanium failed because writing compilers which make VLIW efficient is probably not surmountable for human minds. Donald Knuth himself said that compilers which would make it perform even sufficiently were nearly impossible to write.

What is an Itanium based system?

Itanium-based processors have the ability to handle intensive computing needs of business-critical applications in an enterprise-level environment. An Itanium processor uses a whole new architecture, not just extending the 32-bit architecture to 64-bit, and it can thus be called a native 64-bit processor.


1 Answers

Itanium failed because VLIW for today's workloads is simply an awful idea.

Donald Knuth, a widely respected computer scientist, said in a 2008 interview that "the "Itanium" approach [was] supposed to be so terrific—until it turned out that the wished-for compilers were basically impossible to write."1

That pretty much nails the problem.

For scientific computation, where you get at least a few dozens of instructions per basic block, VLIW probably works fine. There's enough instructions there to create good bundles. For more modern workloads, where oftentimes you get about 6-7 instructions per basic block, it simply doesn't (that's the average, IIRC, for SPEC2000). The compiler simply can't find independent instructions to put in the bundles.

Modern x86 processors, with the exception of Intel Atom (pre Silvermont) and I believe AMD E-3**/4**, are all out-of-order processors. They maintain a dynamic instruction window of roughly 100 instructions, and within that window they execute instructions whenever their inputs become ready. If multiple instructions are ready to go and they don't compete for resources, they go together in the same cycle.

So how is this different from VLIW? The first key difference between VLIW and out-of-order is that the the out-of-order processor can choose instructions from different basic blocks to execute at the same time. Those instructions are executed speculatively anyway (based on branch prediction, primarily). The second key difference is that out-of-order processors determine these schedules dynamically (i.e., each dynamic instruction is scheduled independently; the VLIW compiler operates on static instructions).

The third key difference is that implementations of out-of-order processors can be as wide as wanted, without changing the instruction set (Intel Core has 5 execution ports, other processors have 4, etc). VLIW machines can and do execute multiple bundles at once (if they don't conflict). For example, early Itanium CPUs execute up to 2 VLIW bundles per clock cycle, 6 instructions, with later designs (2011's Poulson and later) running up to 4 bundles = 12 instructions per clock, with SMT to take those instructions from multiple threads. In that respect, real Itanium hardware is like a traditional in-order superscalar design (like P5 Pentium or Atom), but with more / better ways for the compiler to expose instruction-level parallelism to the hardware (in theory, if it can find enough, which is the problem).

Performance-wise with similar specs (caches, cores, etc) they just beat the crap out of Itanium.

So why would one buy an Itanium now? Well, the only reason really is HP-UX. If you want to run HP-UX, that's the way to do it...

Many compiler writers don't see it this way - they always liked the fact that Itanium gives them more to do, puts them back in control, etc. But they won't admit how miserably it failed.


Footnote 1:

This was part of a response about the value of multi-core processors. Knuth was saying parallel processing is hard to take advantage of; finding and exposing fine-grained instruction-level parallelism (and explicit speculation: EPIC) at compile time for a VLIW is also a hard problem, and somewhat related to finding coarse-grained parallelism to split a sequential program or function into multiple threads to automatically take advantage of multiple cores.

11 years later he's still basically right: per-thread performance is still very important for most non-server software, and something that CPU vendors focus on because many cores is no substitute.

like image 80
Vlad Petric Avatar answered Oct 13 '22 09:10

Vlad Petric