Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is a good sorting algorithm on CUDA?

Tags:

sorting

cuda

I have an array of struct and I need to sort this array according to a property of the struct (N). The object looks like this:

 struct OBJ
 { 
   int N; //sort array of OBJ with respect to N
   OB *c; //OB is another struct
 } 

The array size is small, about 512 elements, but the size of every element is big therefore I cannot copy the array to shared memory.

What is the simplest and "good" way to sort this array? I do not need a complex algorithm that require a lot of time to implement (since the number of elements in the array is small) I just need a simple algorithm.

Note: I have read some papers about sorting algorithms using GPUs, but the speed gain from these papers only show up when the size of the array is very big. Therefore I did not try to implement their algorithms because the size of my array is small. I only need a simple way to parallel sort my array. Thanks.

like image 261
liz Avatar asked Mar 13 '11 11:03

liz


3 Answers

What means "big" and "small" ?

By "big" I assume you mean something of >1M elements, while small --- small enough to actually fit in shared memory (probably <1K elements). If my understanding of "small" matches yours, I would try the following:

  • Use only a single block to sort the array (it can be then a part of some bigger CUDA kernel)
  • Bitonic sort is one of good appraches which can be adopted for parallel algorithm.

Some pages on bitonic sort:

  • Bitonic sort (nice explanation, applet to visualise and java source which does not take too much space)
  • Wikipedia (a bit too short explanation for my taste, but more source codes - some abstract language and Java)
  • NVIDIA code Samples (A sample source in CUDA. I think it is a bit ovefocused on killing bank conflicts. I believe the simpler code may actually perform faster)

I once also implemented a bubble sort (lol!) for a single warp to sort arrays of 32 elements. Thanks to its simplicity it did not perform that bad actually. A well tuned bitonic sort will still perform faster though.

like image 173
CygnusX1 Avatar answered Oct 23 '22 06:10

CygnusX1


Use the sorting calls available in the CUDPP or the Thrust library.

If you use cudppSort, note that it only works with integers or floats. To sort your array of structures, you can first sort the keys along with an index array. Later, you can use the sorted index array to move the structures to their final sorted location. I have described how to do this for the cudppCompact compaction algorithm in a blog post here. The steps are similar for sorting an array of structs using cudppSort.

like image 41
Ashwin Nanjappa Avatar answered Oct 23 '22 05:10

Ashwin Nanjappa


Why exactly are you heading towards CUDA? I mean, it smells like your problem is not one of those, CUDA is very good at. You just want to sort an array of 512 Elements and let some pointers refer to another location. This is nothing fancy, use a simple serial algorithm for that, e.g. Quicksort, Heapsort or Mergesort.

Additionally, think about the overhead it takes to copy data from your Heap/Stack to your CUDA device. Using CUDA just makes sense, when the calculations are intense enough so that COMPUTING_TIME_ON_CUDA+COPY_DATA_FROM_HEAP_TO_CUDA_DEVICE+COPY_DATA_FROM_CUDA_DEVICE_TO_HEAP < COMPUTING_TIME_ON_HOST_CPU.

Besides, CUDA is immersely powerful at math calculations with big vectors and matrices and rather simple data-types (numbers) because it is one of the problems that often arise on a GPU: Calculating graphics.

like image 26
crusoe Avatar answered Oct 23 '22 06:10

crusoe