I'm looking for high performance multiscan / multi prefix-sum (many rows in a one kernel execution) function for my project in CUDA.
I've tried the one from Thrust library but it's a way too slow. Also thrust crash after being compiled with nvcc debug flags (-g -G).
After my failure with Thrust I focused on cuDPP library which used to be a part of CUDA toolkit. The cuDPP performance is really good but the library is not up to date with latest cuda 5.5 and there are some global memory violation issues in cudppMultiScan() function while debugging with memory checker. (cuda 5.5, nsight 3.1, visual studio 2010, gtx 260 cc 1.3)
Does anybody have any idea what to use instead of these two libraries?
R.
These libraries, especially thrust, try to be as generic as possible and optimization often requires specialization: For example a specialization of an algorithm can use shared memory for fundamental types (like int or float) but the generic version can't. It happens that for a particular situation a specialization is missing!
It's a good idea to use these well tested generic libraries as much as possible but sometimes, for some performance critical sections, your own implementation is an option to consider.
In your situation you want many scans in parallel for different rows. A good implementation would not run the scan separately for different rows: It would have the same kernel call running simultaneously for all elements of all the rows. Depending on its index, a thread can know which row it's processing and will ignore all data out of the row.
Such specialization requires a functor that returns an absorbing value that prevent mixing rows. Still, your own careful implementation would be likely way faster.
To write your own prefix scan, you may refer to
To do multi prefix-sum you can launch many times the same kernel (as suggested by a.lasram) or try to achieve cuncurrency by CUDA streams, although I do not know it this will effectively work for your card.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With