Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Histogram calculation with Thrust

Tags:

cuda

thrust

If i is a random walk like below (each index is not unique), and there is a device vector A filled with zeros.

{0, 1, 0, 2, 3, 3,  ....}

Is it possible that thrust can make A[i] auto-increment, after the operation A may look like

//2 means appears count of 0's
//1 means appears count of 1's
//1 means appears count of 2's
//2 means appears count of 3's
{2, 1, 1, 2}

I had tried several cases, but these case only works fine when A is a host vector, I guess that because thrust do the parallel, that it previous result can't affect the new one, the result may look like //only count once no matter the index appear how many times {1, 1, 1, 1}

Could thrust achieve my goal with device vector A and a random walk index vector?

like image 761
user1995868 Avatar asked Oct 21 '22 18:10

user1995868


1 Answers

If you are seeking histogram calculations by Thrust, then you may wish to note that there is a Thrust documentation example providing two different algorithms:

  1. Dense histogram, using sort to sort the array, then using upper_bound to determine a comulative histogram and finally using adjacent_difference to calculating the histogram;
  2. Sparse histogram, using sort to sort the array and then reduce_by_key, as mentioned by @Eric in his comment.

From this two threads

  • histogram or count_by_key;
  • How to obtain a histogram from a sorted sequence.

I would say that the above are the only two methods to achieve an histogram using Thrust. I have timed both the approaches on a Kepler K20c card and these have been the timing:

  • N=1024*16; # bins = 16*16; Dense = 2.0ms; Sparse = 2.4ms;
  • N=1024*128; # bins = 16*128; Dense = 3.4ms; Sparse = 3.1ms;

On accounting for the fact that the timing does depend on the input array, I would say that the results do not appear to be dramatically different.

It should be noticed that the CUDA samples provide a histogram calculation example, but it is optimized for 64 or 256 bins, so it is not homogeneous to the above mentioned Thrust codes.

like image 99
Vitality Avatar answered Oct 31 '22 14:10

Vitality