I want to write a program for GPU (preferrably OpenCL) and a large part of the computation consists of counting the number of 1's in a bit array (packed as long or int).
So, on modern CPUs I would obviously just use the native __popcnt instruction. I read on several places on the internet that modern GPUs, this instruction is also present in the hardware, which would be a huge speedup for me. (at least for 32-bit, not sure about 64)
However, I find nowhere how to us this instruction. So:
1) how should I find out which GPUs have this instruction? (I still need to buy my GPU, so it will be a modern high-end one... probably Radeon HD7000 series or nVidia Kepler)
2) how to call this instruction from OpenCL (or a similar GPU language)?
This is available as an extension cl_amd_popcnt. I have a Radeon 6870 card and opteron 6128 cpu, both support the extension.
Even better news for you is that as of OpenCL 1.2, it is no longer an extension. See the instruction popcount on the reference card and in the spec. The AMD 7xxx series hardware is OCL 1.2 compatible, and I imagine the new Nvidia stuff is too.
"T is type char, charn, uchar, ucharn, short, shortn, ushort, ushortn, int, intn, uint, uintn, long, longn, ulong, or ulongn, where n is 2, 3, 4, 8, or 16"
T popcount(T x) returns the number of populated (non-zero) bits in x.
http://www.khronos.org/registry/cl/sdk/1.2/docs/OpenCL-1.2-refcard.pdf
http://www.khronos.org/registry/cl/specs/opencl-1.2.pdf
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With