Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Strategy for doing final reduction

I am trying to implement an OpenCL version for doing reduction of a array of float.

To achieve it, I took the following code snippet found on the web :

__kernel void sumGPU ( __global const double *input, 
                       __global double *partialSums,
               __local double *localSums)
 {
  uint local_id = get_local_id(0);
  uint group_size = get_local_size(0);

  // Copy from global memory to local memory
  localSums[local_id] = input[get_global_id(0)];

  // Loop for computing localSums
  for (uint stride = group_size/2; stride>0; stride /=2)
     {
      // Waiting for each 2x2 addition into given workgroup
      barrier(CLK_LOCAL_MEM_FENCE);

      // Divide WorkGroup into 2 parts and add elements 2 by 2
      // between local_id and local_id + stride
      if (local_id < stride)
        localSums[local_id] += localSums[local_id + stride];
     }

  // Write result into partialSums[nWorkGroups]
  if (local_id == 0)
    partialSums[get_group_id(0)] = localSums[0];
 }                  

This kernel code works well but I would like to compute the final sum by adding all the partial sums of each work group. Currently, I do this step of final sum by CPU with a simple loop and iterations nWorkGroups.

I saw also another solution with atomic functions but it seems to be implemented for int, not for floats. I think that only CUDA provides atomic functions for float.

I saw also that I could another kernel code which performs this operation of sum but I would like to avoid this solution in order to keep a simple readable source. Maybe I cannot do without this solution...

I must tell you that I use OpenCL 1.2 (returned by clinfo) on a Radeon HD 7970 Tahiti 3GB (I think that OpenCL 2.0 is not supported with my card).

More generally, I would like to get advice about the simplest method to perform this last final summation with my graphics card model and OpenCL 1.2.

like image 941
youpilat13 Avatar asked Apr 27 '16 02:04

youpilat13


People also ask

What are the strategies of cost reduction?

Cost reduction strategies are practices and principles designed to optimize operational efficiency. They cover all aspects of running a business, from hiring employees to booking flights. Successful implementation works by streamlining processes, allocating resources effectively, and eliminating waste.

Why is cost reduction strategy important?

Cost reduction strategies will lower operations costs while improving productivity, which allows for strategic resource reallocation. The cost reduction strategies afford additional benefits that will be felt throughout the business by accelerating processes, eliminating waste, and utilizing resources effectively.

What is a strategic initiative to cut costs and achieve advantage?

'Strategic cost-cutting' helps companies lower costs, focus on the aspects of the business that are controllable and free up resources to fund transformation and future growth.


1 Answers

If that float's order of magnitude is smaller than exa scale, then:

Instead of

if (local_id == 0)
  partialSums[get_group_id(0)] = localSums[0];

You could use

if (local_id == 0)
{
    if(strategy==ATOMIC)
    {
        long integer_part=getIntegerPart(localSums[0]);
        atom_add (&totalSumIntegerPart[0] ,integer_part);
        long float_part=1000000*getFloatPart(localSums[0]);
         // 1000000 for saving meaningful 7 digits as integer
        atom_add (&totalSumFloatPart[0] ,float_part);
    }
}

this will overflow float part so when you divide it by 1000000 in another kernel, it may have more than 1000000 value so you get its integer part and add it to the real integer part:

   float value=0;
   if(strategy==ATOMIC)
   {
       float float_part=getFloatPart_(totalSumFloatPart[0]);
       float integer_part=getIntegerPart_(totalSumFloatPart[0])
       + totalSumIntegerPart[0];
       value=integer_part+float_part;
   }

just a few atomic operations shouldn't be effective on whole kernel time.

Some of these get___part can be written easily already using floor and similar functions. Some need a divide by 1M.

like image 93
huseyin tugrul buyukisik Avatar answered Sep 25 '22 13:09

huseyin tugrul buyukisik