Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What does programming for PS3's Cell Processor entail?

Tags:

ps3

How is programming for the Cell Processor on the PS3 different than programming for any other processor found on a normal desktop?

What kind of programming paradigms, techniques, and practices are used to fully utilize the Cell Processors potential?

All the articles I hear concerning PS3 development discuss, "Learning how to program on the Cell Processor." What does this really mean beyond some hand waving?

like image 849
KingNestor Avatar asked Aug 31 '09 04:08

KingNestor


People also ask

Why is the Cell processor hard to program for?

The CELL processor has multiple cores, but every core has little bit different function. So the application requires design similar to using multiple GPUs.

Why is the PS3 so hard to develop for?

Dobb's Journal tested the development process of the PlayStation 3 and found that Sony's console is "difficult to program for." The report's authors went on to explain that "software that exploits the Cell's potential requires a development effort significantly greater than traditional platforms."

What does the Cell processor do?

Cell is a multi-core microprocessor microarchitecture that combines a general-purpose PowerPC core of modest performance with streamlined coprocessing elements which greatly accelerate multimedia and vector processing applications, as well as many other forms of dedicated computation.

How powerful is the PS3 CPU?

At its heart is the Power Processing Element, or PPE, featuring a 3.2GHz, dual-core CPU based on IBM's PowerPC 2.02 ISA (Instruction Set Architecture). Alongside it, the platform made use of eight co-processors, dubbed Synergistic Processing Elements, or SPEs, also clocked at 3.2GHz.


1 Answers

In addition to everything George mentions, the SPUs are really better thought of as streaming vector processors. They work best when you have an algorithm that works on long sequences of numerical data, which can be fed through the SPU's limited memory via DMA, rather than having the SPU load a chunk of memory, try to operate on it, find that it needs to follow a pointer to somewhere outside its memory, load that, keep going, find another one, and so on.

So, programming for them isn't a simple model of concurrency and threads; it's more like high performance numerical or scientific computation. It is also non-uniform memory access taken to an extreme.

Furthermore, every processor is in-order with deep pipelines, so the programmer has to be much more aware of data hazards and instruction bubbles and all the numerous micro-optimizations that we are told the compiler "should" take care of for us (but it really doesn't). Things like mispredicted branches, load-hit-stores, cache misses, etc. hurt a lot more than they would on an out-of-order processor that could juggle the order of operations around to hide such latencies.

For concrete examples, check out Mike Acton's CellPerformance blog. Mike is my favorite old-school assembly-happy perf curmudgeon in the business, and he's really earned his chops on this issue.

like image 55
Crashworks Avatar answered Oct 05 '22 10:10

Crashworks