The question has been asked before in a slightly different form, but I'd like to know what Android-developers think what's really behind Google's decision and not what Google's official answer is.
OpenCL is an open standard and works on various devices, such as CPUs, desktop GPUs, ARM processors, FPGAs and DSPs. It gives us developers the convenience of creating high performance software and libraries, which works on all devices.
RenderScript is a higher level language, which focuses mainly on media-manipulation and runs on both CPU and GPU. It works on all new Android phones and tablets, but not on other operating systems. A big difference with OpenCL is that RenderScript always handles the scheduling, and not the software.
Google's official answer was factually incorrect on OpenCL, which frustrated many in the OpenCL-community and logically gave some hefty reactions. So please be factual about both RenderScript and OpenCL - I don't want this question to be closed because nonsense is being told.
First, let us deal with the answer to this question by Tim Murray.
He states that the OpenCL/CUDA execution model is tied into various factors of their execution model like register counts, local memory and other such details. While this may be partly true, the OpenCL execution model was specifically developed to allow a clever developer to abstract these differences in a way that can still yield the maximum performance.
For example: To deal with differences in micro-architectures should a kernel developer need to know such details, the OpenCL runtime API provides clGetDeviceInfo which exposes a plethora of information (Note that extension information can also be retrieved here).
Details of vector (SIMD-style) execution are also not spared. Most OpenCL implementation guides state that kernels should be written without explicit vectorization - the implementation will vectorize execution of adjacent work-items. This is also the model followed by CUDA (which does not even provide vector types anymore, but this is a different matter).
Coming to the point of work-items; it is indeed possible to constrain a work-dimension to a particular size. However in practice, the reqd_work_group_size
attribute is hardly ever used unless it is some known dimension (for the sake of the calculation, not performance).
Also, the OpenCL documentation for clEnqueueNDRangeKernel clearly states that
"local_work_size can also be a NULL value in which case the OpenCL implementation will determine how to be break the global work-items into appropriate work-group instances."
This is true of the Intel and AMD implementations.
Let us now move onto the points raised by Stephen Hines over on the Android bug page over here.
"Not Google but the hardware vendors made the drivers for RenderScript Compute. ARM chose to build the RSC-compiler on top of OpenCL, because they already chose for OpenCL.
See - the hardware vendors did not create the drivers because Google or Khronos Group asked them too, they created them because they wanted to. OpenGL and WebCL are some of the reasons, but also the competition over the new desktop."
In the end, being a developer who has worked with GPGPU since the days of register combiners (on a GeForce 2), I see no reason why OpenCL is any more disruptive to the Android ecosystem or why it should be preferred than this answer that states
Apple holds the trademark on OpenCL. Google competes with Apple. Perhaps it's really that simple.
We've done work on OpenCL with Android (see here) and are happy to see it moving forward thanks to the work of Intel, Imagination, and other chip makers. Google will turn around soon enough.
Perhaps it really IS that simple.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With