I would like to know if thrust::sort() can be used inside a thread
__global__
void mykernel(float* array, int arrayLength)
{
int threadID = blockIdx.x * blockDim.x + threadIdx.x;
// array length is vector in the device global memory
// is it possible to use inside the thread?
thrust::sort(array, array+arrayLength);
// do something else with the array
}
If yes, does the sort launch other kernels to parallelize the sort?
Yes, thrust::sort
can be combined with the thrust::seq
execution policy to sort numbers sequentially within a single CUDA thread (or sequentially within a single CPU thread):
#include <thrust/sort.h>
#include <thrust/execution_policy.h>
__global__
void mykernel(float* array, int arrayLength)
{
int threadID = blockIdx.x * blockDim.x + threadIdx.x;
// each thread sorts array
// XXX note this causes a data race
thrust::sort(thrust::seq, array, array + arrayLength);
}
Note that your example causes a data race because each CUDA thread attempts to sort the same data in parallel. A correct race-free program would partition array
according to thread index.
The thrust::seq
execution policy, which is required for this feature, is only available in Thrust v1.8 or better.
@aland already referred you to an earlier answer about calling Thrust's parallel algorithms on the GPU - in that case the asker was simply trying to sort data which was already on the GPU; Thrust called from the CPU can handle GPU-resident data by cast pointers to vectors.
Assuming your question is different and you really want to call a parallel sort in the middle of your kernel (as opposed to break the kernel into multiple smaller kernels and call sort in between) then you should consider CUB, which provides a variety of primitives suitable for your purposes.
Update: Also see @Jared's answer in which he explains that you can call Thrust's sequential algorithms from on the GPU as of Thrust 1.7.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With