What are the technical reasons behind the "Itanium fiasco", if any? [closed]

Tags:

In this article Jonh Dvorak calls Itanium "one of the great fiascos of the last 50 years". While he describes the over-optimistic market expectations and the dramatic financial outcome of the idea, he doesn't go into the technical details of this epic fail. I had my chance to work with Itanium for some period of time and I personally loved its architecture, it was so clear and simple and straightforward in comparison to the modern x86 processors architecture...

So then what are/were the technical reasons of its failure? Under-performance? Incompatibility with x86 code? Complexity of compilers? Why did this "Itanic" sink?

Itanium processor block

965

asked Jun 18 '09 09:06

Max Galkin

1 Answers

Itanium failed because VLIW for today's workloads is simply an awful idea.

Donald Knuth, a widely respected computer scientist, said in a 2008 interview that "the "Itanium" approach [was] supposed to be so terrific—until it turned out that the wished-for compilers were basically impossible to write."¹

That pretty much nails the problem.

For scientific computation, where you get at least a few dozens of instructions per basic block, VLIW probably works fine. There's enough instructions there to create good bundles. For more modern workloads, where oftentimes you get about 6-7 instructions per basic block, it simply doesn't (that's the average, IIRC, for SPEC2000). The compiler simply can't find independent instructions to put in the bundles.

Modern x86 processors, with the exception of Intel Atom (pre Silvermont) and I believe AMD E-3**/4**, are all out-of-order processors. They maintain a dynamic instruction window of roughly 100 instructions, and within that window they execute instructions whenever their inputs become ready. If multiple instructions are ready to go and they don't compete for resources, they go together in the same cycle.

So how is this different from VLIW? The first key difference between VLIW and out-of-order is that the the out-of-order processor can choose instructions from different basic blocks to execute at the same time. Those instructions are executed speculatively anyway (based on branch prediction, primarily). The second key difference is that out-of-order processors determine these schedules dynamically (i.e., each dynamic instruction is scheduled independently; the VLIW compiler operates on static instructions).

The third key difference is that implementations of out-of-order processors can be as wide as wanted, without changing the instruction set (Intel Core has 5 execution ports, other processors have 4, etc). VLIW machines can and do execute multiple bundles at once (if they don't conflict). For example, early Itanium CPUs execute up to 2 VLIW bundles per clock cycle, 6 instructions, with later designs (2011's Poulson and later) running up to 4 bundles = 12 instructions per clock, with SMT to take those instructions from multiple threads. In that respect, real Itanium hardware is like a traditional in-order superscalar design (like P5 Pentium or Atom), but with more / better ways for the compiler to expose instruction-level parallelism to the hardware (in theory, if it can find enough, which is the problem).

Performance-wise with similar specs (caches, cores, etc) they just beat the crap out of Itanium.

So why would one buy an Itanium now? Well, the only reason really is HP-UX. If you want to run HP-UX, that's the way to do it...

Many compiler writers don't see it this way - they always liked the fact that Itanium gives them more to do, puts them back in control, etc. But they won't admit how miserably it failed.

Footnote 1:

This was part of a response about the value of multi-core processors. Knuth was saying parallel processing is hard to take advantage of; finding and exposing fine-grained instruction-level parallelism (and explicit speculation: EPIC) at compile time for a VLIW is also a hard problem, and somewhat related to finding coarse-grained parallelism to split a sequential program or function into multiple threads to automatically take advantage of multiple cores.

11 years later he's still basically right: per-thread performance is still very important for most non-server software, and something that CPU vendors focus on because many cores is no substitute.

answered Oct 13 '22 09:10

Vlad Petric

Related questions
                            
                                Why does multithreaded file transfer improve performance?
                            
                                in Delphi7, How can I retrieve hard disk unique serial number?
                            
                                Why do computers work in binary?
                            
                                What is the easiest way in C# to check if hard disk is SSD without writing any file on hard disk?
                            
                                Is it fair to compare SSE/AVX units to GPU cores?
                            
                                Direct memory access DMA - how does it work?
                            
                                Is the Intel Xeon Phi usable without a costly Intel Compiler?
                            
                                How are interrupts handled by dual processor machines?
                            
                                Whats the best way to determine the hardware requirements for an application
                            
                                Programming for Multi core Processors
                            
                                How does the OS detect hardware?
                            
                                Program exceeding theoretical memory transfer rate
                            
                                Why aren't Floating-Point Decimal numbers hardware accelerated like Floating-Point Binary numbers?
                            
                                Listing available devices in python-opencv
                            
                                Hardware for .NET Micro Framework
                            
                                Estimating process energy usage on PCs (x86)
                            
                                Why is not USB interrupt-driven?
                            
                                Where can I start with programmable Hardware?
                            
                                Difference with CUDA Hardware Quadro 4000 Vs. GeForce 480
                            
                                How does random access memory work? Why is it constant-time random-access?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

What are the technical reasons behind the "Itanium fiasco", if any? [closed]

Tags:

hardware

itanium

Max Galkin

People also ask

1 Answers

Vlad Petric

Recent Activity

Donate For Us