Array to sort has approximately one million strings, where every string can have length up to one million characters.
I am looking for any implementation of sorting algorithm for GPU.
I have a block of data with size approximately 1MB and I need to construct suffix array. Now you can see how it is possible to have one million strings inside really small amount of memory.
The state of the art in GPU sorting isn't particularly encouraging.
For sorting 32-bit integers the following paper from 2009 (with 2 authors who are researchers at Nvidia) only claims 23% increase for the best CUDA sort on GTX280 compared to the best CPU sort on a 4 core Yorkfield.
http://www.mgarland.org/files/papers/gpusort-ipdps09.pdf
This used a radix sort on the GPU, and merge sort on CPU. You'd need a comparison-based sort in order to construct a suffix array, so instead of GPU radix sort the best of those in the paper would be GPU merge sort, which achieved about half the speed of GPU radix sort (with 1 million keys) - ie about 40% slower than the CPU merge sort.
Adding variable length keys seems likely to cause threads in a warp will get out of sync on a GPU, so would reduce the performance on GPU more than CPU.
Overall if your purpose is to build an efficient system, I'd recommend that you use a CPU implementation for this problem because it will be faster and easier to write.
But, if your purpose is to experiment or just to learn about GPU, then you can find the CUDA implementation of merge sort from the paper in the CUDA SDK:
http://developer.download.nvidia.com/compute/cuda/sdk/website/Data-Parallel_Algorithms.html
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With