I understand that the answer for this question depends on the specific OpenCL implementation and the hardware, but I need to choose between sincos
and native_cos
followed by native_sin
for using in a Mac app.
Which is expected to be faster?
You can add a mini benchmark testing all versions of a transcendental function and alter the kernel-string accordingly(prepending native_
to a cos
for example.) with the benchmark results. This would need event based profiling and be appropriate for portability. Then, once for every N iterations, it could re-bench and have minor changes accordingly if there was any error from the last bench.
You can even benchmark for permutations of a series of functions (such as using native for first function but non native on second, native on third in first version, then alternate nativeness on other 5 versions, benchmark all) to fit the code better on pipeline architecture where order of functions matter.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With