I have a few basic algorithms (DCT/IDCT and a few others) ported and working (as expected atleast functionally) on Nexus 10. Since these algorithms are first implementations, their execution time is currently running into secs, which is understandable.
However, given the architecture of Renderscript, I see that these algorithms run either on CPU or GPU depending on other parallel application activities. For instance, in my application, there is a scrollview for images and any activity on this view, essentially pushes the renderscript execution to CPU. If there is no activity the algorithm runs on GPU. I see this live via ARM-DS5 Mali/A15 traces.
This situation is presenting itself as debug/tuning nightmare, since the performance delta when the algorithm runs on CPU (dual core) versus GPU (Mali) is of the order of 2 secs, making it very difficult to gauge the performance improvements that I am doing on my algorithm code.
is there a way to get around this problem? One possible solution is to atleast have a debug configurability option to choose the target type (ARM, GPU) for renderscript code?
adb shell setprop debug.rs.default-CPU-driver 1
This will force execution to run on the reference CPU implementation. There is no equivalent to force things to the GPU as many conditions could make that impossible at runtime.
Also useful is:
adb shell setprop debug.rs.max-threads 1
Which limits the number of CPU cores to be used to 1 (or any other value you set up to the CPU count of the device)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With