I currently have a large array of floats that I process in my OpenCL kernel i am wondering if i divide this array up and use an OpenCL vector type array instead, if it will speed up the process. Basically if i had an array of 4,800 floats i would divide it up into an array of 300 float16 vectors. Would this take advantage of SIMD?
Intel actually describes what their OpenCL SDK does: see Writing Optimal OpenCL™ Code with Intel® OpenCL SDK. You might want to check that out, as an addition to benchmarking. The interesting part starts at chapter 2.3.
To answer your question: yes, it will take advantage of SIMD. But to "maximize utilization of the CPU vector units by using vector data types" you should really read that document.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With