I am using OpenCL to execute a procedure on different GPUs and CPUs simultaneously to get a high performance results. The Intel OpenCL is always showing a message that the Kernel is not vectorized, so it will only run on different cores but will not run using SIMD instructions. My question is, if I rewrite the code so that the SIMD instruction can be exploit with the OpenCL code, will it increase the GPU Performance also?
Yes - but beware that this is not necessary on AMD GCN based APU/GPU or Nvidia Fermi or higher GPU hardware for good performance -they do scalar operations with great utilization. CPUs and Intels GPU however can greatly benefit via SIMD instructions which is what the vector operations boil down to.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With