I am wondering what is the key thing that helps you in GPGPU development and of course what is the constraints that you find unacceptable. Comes to mind for me: <ul> <li>Key advantage: the raw power of these things</li> <li>Key constraint: the memory model</li> </ul> What's your view?

You have to be careful with how you interpret Tim Sweeney's statements in that Ars interview. He's saying that having two separate platforms (the CPU and GPU), one suitable for single-threaded performance and one suitable for throughput-oriented computing, will soon be a thing of the past, as our applications and hardware grow towards one another. The GPU grew out of technology limitations with the CPU, which made the arguably more natural algorithms like ray-tracing and photon mapping nigh-undoable at reasonable resolutions and framerates. In came the GPU, with a wildly different and restrictive programming model, but maybe 2 or 3 orders of magnitude better throughput for applications painstakingly coded to that model. The two machine models had (and still have) essentially different coding styles, languages (OpenGL, DirectX, shader languages vs. traditional desktop languages), and workflows. This makes code reuse, and even algorithm/programming skill reuse, extremely difficult, and hamstrings any developer who wants to make use of a dense parallel compute substrate into this restrictive programming model. Finally, we're coming to a point where this dense compute substrate is similarly programmable to a CPU. Although there is still a sizeable performance delta between one "core" of these massively-parallel accelerators (though the threads of execution within, for example, an SM on the G80, are not exactly cores in the traditional sense) and a modern x86 desktop core, two factors drive convergence of these two platforms: <ul> <li>Intel and AMD are moving towards more, simpler cores on x86 chips, converging the hardware with the GPU, where units are becoming more coarse-grained and programmable over time).</li> <li>This and other forces are spawning many new applications that can take advantage of Data- or Thread-Level Parallelism (DLP/TLP), effectively utilizing this kind of substrate.</li> </ul> So, what Tim was saying is that the 2 distinct platforms will converge, to an even greater extent than, for instance, OpenCl, affords. A salient quote from the interview: <blockquote> TS: No, I see exactly where you're heading. In the next console generation you could have consoles consist of a single non-commodity chip. It could be a general processor, whether it evolved from a past CPU architecture or GPU architecture, and it could potentially run everything—the graphics, the AI, sound, and all these systems in an entirely homogeneous manner. That's a very interesting prospect, because it could dramatically simplify the toolset and the processes for creating software. Right now, in the course of shipping Unreal 3, we have to use multiple programming languages. We use one programming language for writing pixel shaders, another for writing gameplay code, and then on PlayStation 3 we use yet another compiler to write code to run on the Cell processor. So the PlayStation 3 ends up being a particular challenge, because there you have three completely different processors from different vendors with different instruction sets and different compilers and different performance techniques. So, a lot of the complexity is unnecessary and makes load-balancing more difficult. When you have, for example, three different chips with different programming capabilities, you often have two of those chips sitting idle for much of the time, while the other is maxed out. But if the architecture is completely uniform, then you can run any task on any part of the chip at any time, and get the best performance tradeoff that way. </blockquote>

What are the advantages and disadvantages of GPGPU (general-purpose GPU) development? [closed]

1 Answers

You have to be careful with how you interpret Tim Sweeney's statements in that Ars interview. He's saying that having two separate platforms (the CPU and GPU), one suitable for single-threaded performance and one suitable for throughput-oriented computing, will soon be a thing of the past, as our applications and hardware grow towards one another.

The GPU grew out of technology limitations with the CPU, which made the arguably more natural algorithms like ray-tracing and photon mapping nigh-undoable at reasonable resolutions and framerates. In came the GPU, with a wildly different and restrictive programming model, but maybe 2 or 3 orders of magnitude better throughput for applications painstakingly coded to that model. The two machine models had (and still have) essentially different coding styles, languages (OpenGL, DirectX, shader languages vs. traditional desktop languages), and workflows. This makes code reuse, and even algorithm/programming skill reuse, extremely difficult, and hamstrings any developer who wants to make use of a dense parallel compute substrate into this restrictive programming model.

Finally, we're coming to a point where this dense compute substrate is similarly programmable to a CPU. Although there is still a sizeable performance delta between one "core" of these massively-parallel accelerators (though the threads of execution within, for example, an SM on the G80, are not exactly cores in the traditional sense) and a modern x86 desktop core, two factors drive convergence of these two platforms:

Intel and AMD are moving towards more, simpler cores on x86 chips, converging the hardware with the GPU, where units are becoming more coarse-grained and programmable over time).
This and other forces are spawning many new applications that can take advantage of Data- or Thread-Level Parallelism (DLP/TLP), effectively utilizing this kind of substrate.

So, what Tim was saying is that the 2 distinct platforms will converge, to an even greater extent than, for instance, OpenCl, affords. A salient quote from the interview:

TS: No, I see exactly where you're heading. In the next console generation you could have consoles consist of a single non-commodity chip. It could be a general processor, whether it evolved from a past CPU architecture or GPU architecture, and it could potentially run everything—the graphics, the AI, sound, and all these systems in an entirely homogeneous manner. That's a very interesting prospect, because it could dramatically simplify the toolset and the processes for creating software.

Right now, in the course of shipping Unreal 3, we have to use multiple programming languages. We use one programming language for writing pixel shaders, another for writing gameplay code, and then on PlayStation 3 we use yet another compiler to write code to run on the Cell processor. So the PlayStation 3 ends up being a particular challenge, because there you have three completely different processors from different vendors with different instruction sets and different compilers and different performance techniques. So, a lot of the complexity is unnecessary and makes load-balancing more difficult.

When you have, for example, three different chips with different programming capabilities, you often have two of those chips sitting idle for much of the time, while the other is maxed out. But if the architecture is completely uniform, then you can run any task on any part of the chip at any time, and get the best performance tradeoff that way.

178

answered Jan 02 '23 01:01

Matt J

Related questions
                            
                                Fastest way to generate a dict from list where key == value
                            
                                (How) can I predict the runtime of a code snippet using LLVM Machine Code Analyzer?
                            
                                How to improve performance of an EF Core query which uses several Includes
                            
                                numpy np.where scan array once to get both sets of indices corresponding to T/F condition
                            
                                Finding minimum total length of line segments to connect 2N points
                            
                                How can I improve the performance of my script?
                            
                                Fast intersection of HashSet<int> and List<int>
                            
                                Fuzzy searching a large set of data efficiently?
                            
                                Is read/write performance better with docker volumes on windows (inside of a docker container only) or a mounted / shared volume with host OS?
                            
                                Android: Jank on initial scroll with RecyclerView in debug
                            
                                Fastest way to merge millions of files
                            
                                How to write a function to find a value bigger than N in parallel
                            
                                Find indices for elements in array B best matching those in array A
                            
                                Has anybody used Google Performance Tools?
                            
                                Surprisingly different performance of simple C# program
                            
                                AVX 256-bit code performing slightly worse than equivalent 128-bit SSSE3 code
                            
                                fast way to check if an array of chars is zero [duplicate]
                            
                                Which is better? array, ArrayList or List<T> (in terms of performance and speed)
                            
                                Android 4.3 On-screen GPU profiling - long gfx waiting time
                            
                                Improving Javascript Load Times - Concatenation vs Many + Cache

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

What are the advantages and disadvantages of GPGPU (general-purpose GPU) development? [closed]

Tags:

performance

optimization

hpc

Fabien Hure

People also ask

1 Answers

Matt J

Recent Activity

Donate For Us