I can read around a lot about OpenCL, and it seems to be the most promising (the only one?) multi-architecture library. OpenCL should be the first parallel architecture programming standard, and it'll be eventually adopted by the most part of programmers. That is good, ok, but is there a loss of performance by migrating from a native programming library to OpenCL? In the case of nVidia GeForces, I've already found an article were two realizations of the same program - CUDA vs OpenCL code - were compared and the first one seemed to be more performant. In the case of Pthread or Windows threads, I really have no idea, but I think that "generality" and multi-architecture approach will always have something to "pay". Just to stop speculating about this or that, I'd like to check everything by myself, but I need you to help me! Is there an OpenCL benchmark set, universally accepted, I can use to compare with native code? Is there an analogous of CUDA SDK written in OpenCL code? Thanks to everybody.
Not being a performance/benchmarking expert I can only try to give you a few general thoughts on OpenCL vs. CUDA. Fair warning though, I might get some stuff wrong.
The problem with benchmarks is obviously that you only can objectively evaluate very specific things - say, the same program done in CUDA and OpenCL, on the same hardware (as you named a source). But you won't be able to deduce from that experiment that you'll get similar results on another program, or with different hardware. Results will differ, so you would have to have a big test suite. This is what you ask for, but I don't know anything like that in existence - people will choose either technology for their bigger projects and won't write everything twice.
There are the NVIDIA Code Examples, done in both CUDA and OpenCL. You could choose a few and compare your results.
I dont think that that would be time well spent, though. Maybe you should approach this problem from another angle: what can you do with one of the frameworks that you can't do with the other? They both use the same drivers, so both will support fancy technologies that come out with new hardware. Thread scheduling is done in hardware, so they have the same performance there. What remains to be tested are things like:
From my tests, the answer to these questions - will my code use the hardware optimally - is yes for both frameworks. So they definitely play in the same league, and even if one is 5% faster than the other for some specific problem at the moment, I thing it would not make a difference in a general view.
I intentionally didn't write anything about the other use cases of OpenCL, e.g. on CPUs. This field is much wider, as you have different OSes, even different OpenCL SDKs for the same processors (e.g. Apple and Intel) and lots of ways to parallel program without OpenCL (to compare to).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With