I am migrating my OpenCV 2.4.2 to OpenCV 3.0 for using OpenCL performance throught the Transparent-API. But, I note that some algorithms take the same time in CPU or GPU implementation.
I searched in the official documentation, but I didn't find the answer.
How to know if an algorithm has an automatic OpenCL translation in the Transparent API or not ?
If you have amd gpu, use codexl and create a new codexl project, attach your project and start session from codexl, or, start your project from your IDE and attach codexl to runtime of it. Then when program finishes, codexl automatically generates necessary graph and profiling info(if you choose proper gpu profiling modes) in the end.
I used codexl once to know how compubench.com handles workgroup sizes.(it was 32 on an occasion because of memory resource requirements)
If it doesn't use opencl, codexl informs you with a dialog window telling potential causes.
You can even know errors, warnings this way and look at kernel string(but mangled probably).
Intel has Code-Builder
Nvidia has some profilers too.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With