I'm just starting to learn C++ AMP and I've obtained a few examples that I've built with the VS 2012 RC, but I'm finding that the performance of the GPU is slower than the CPU. For instance, the examples by Kate Gregory: http://ampbook.codeplex.com/releases/view/90595 (relevant to her upcoming book http://www.gregcons.com/cppamp/). They were demonstrated by her in a lecture I watched where she obtained a ~5x performance improvement for the chapter 4 example by using her laptop's GPU (I believe she said it was a 6650) compared to CPU (not sure what CPU she had). I've tried testing the example myself and on a couple of system configurations (as below) I've always found the CPU to be faster. I've also tested other examples and found the same. Am I doing something wrong? Is there a reason for the slower than expected performance? Does anyone have an example that would definitely show the GPU being faster?
Example of results: chapter4 project results in 1.15ms CPU, 2.57ms GPU, 2.55ms GPU tiled.
Edit:
Doh, I think I just found the reason why - the values for the size of the matrices she used in the lecture were different. The sample on the website uses M=N=W=64. If I use 64, 512 and 256 as she did in the lecture then I get the corresponding ~5x increase in performance.
It seems like your overarching question is WHY moving things to the GPU doesn't always get you a benefit. The answer is copy time. Imagine a calculation that takes a time proprotional to n squared. Copying takes a time proportional to n. You might need quite a large n before spending the time to copy to and from the GPU is outweighed by the time saved doing the calculation there.
The book mentions this briefly in the early chapters, and Chapters 7 and 8 are all about performance and optimization. Chapter 7 is on Rough Cuts now; Chapter 8 should be there shortly. (Its code is already on Codeplex - the Reduction case study.)
I've just checked in an update to the Chapter 4 code that uses the Tech Ed starting numbers instead of the ones that were there before. Smaller matrices lose too much time to the copy to/from the GPU; larger ones take too long to be a good demo. But do feel free to play around with the sizes. Make them even larger since you don't mind a minute or two of "dead air", and see what happens.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With