Usage of

Question

What is the meaning of the statement

// create arrays of 1M elements
const int num_elements = 1<<20;

in the code below? Is it specific to CUDA or is can this be used in Standard C as well?

When I printf 'ed num_elements I got num_elements==1048576

Which turns out to be 2^20. So is the << operator a shorthand for exponentiation in C?

// This example demonstrates parallel floating point vector
// addition with a simple __global__ function.

#include <stdlib.h>
#include <stdio.h>


// this kernel computes the vector sum c = a + b
// each thread performs one pair-wise addition
__global__ void vector_add(const float *a,
                           const float *b,
                           float *c,
                           const size_t n)
{
  // compute the global element index this thread should process
  unsigned int i = threadIdx.x + blockDim.x * blockIdx.x;

  // avoid accessing out of bounds elements
  if(i < n)
  {
    // sum elements
    c[i] = a[i] + b[i];
  }
}


int main(void)
{
  // create arrays of 1M elements
  const int num_elements = 1<<20;

  // compute the size of the arrays in bytes
  const int num_bytes = num_elements * sizeof(float);

  // points to host & device arrays
  float *device_array_a = 0;
  float *device_array_b = 0;
  float *device_array_c = 0;
  float *host_array_a   = 0;
  float *host_array_b   = 0;
  float *host_array_c   = 0;

  // malloc the host arrays
  host_array_a = (float*)malloc(num_bytes);
  host_array_b = (float*)malloc(num_bytes);
  host_array_c = (float*)malloc(num_bytes);

  // cudaMalloc the device arrays
  cudaMalloc((void**)&device_array_a, num_bytes);
  cudaMalloc((void**)&device_array_b, num_bytes);
  cudaMalloc((void**)&device_array_c, num_bytes);

  // if any memory allocation failed, report an error message
  if(host_array_a == 0 || host_array_b == 0 || host_array_c == 0 ||
     device_array_a == 0 || device_array_b == 0 || device_array_c == 0)
  {
    printf("couldn't allocate memory
");
    return 1;
  }

  // initialize host_array_a & host_array_b
  for(int i = 0; i < num_elements; ++i)
  {
    // make array a a linear ramp
    host_array_a[i] = (float)i;

    // make array b random
    host_array_b[i] = (float)rand() / RAND_MAX;
  }

  // copy arrays a & b to the device memory space
  cudaMemcpy(device_array_a, host_array_a, num_bytes, cudaMemcpyHostToDevice);
  cudaMemcpy(device_array_b, host_array_b, num_bytes, cudaMemcpyHostToDevice);

  // compute c = a + b on the device
  const size_t block_size = 256;
  size_t grid_size = num_elements / block_size;

  // deal with a possible partial final block
  if(num_elements % block_size) ++grid_size;

  // launch the kernel
  vector_add<<<grid_size, block_size>>>(device_array_a, device_array_b, device_array_c, num_elements);

  // copy the result back to the host memory space
  cudaMemcpy(host_array_c, device_array_c, num_bytes, cudaMemcpyDeviceToHost);

  // print out the first 10 results
  for(int i = 0; i < 10; ++i)
  {
    printf("result %d: %1.1f + %7.1f = %7.1f
", i, host_array_a[i], host_array_b[i], host_array_c[i]);
  }


    // deallocate memory
  free(host_array_a);
  free(host_array_b);
  free(host_array_c);

  cudaFree(device_array_a);
  cudaFree(device_array_b);
  cudaFree(device_array_c);
}

Seth Carnegie · Accepted Answer

No, the << operator is the bit shift operator. It takes the bits of a number, such as 00101 and shifts them over to the left n places, which has the effect of multiplying a number by a power of two. So x << y is x * 2^y. This a result of the way numbers are stored internally in computers, which is binary.

For example, the number 1 is, when stored as a 32-bit integer in 2's complement (which it is):

00000000000000000000000000000001

When you do

1 << 20

You are taking all the 1's in that binary representation and moving them over 20 places:

00000000000100000000000000000000

Which is 2^20. This also works for sign-magnitude representation, 1's complement, etc.

Another example, if you take the representation of 5:

00000000000000000000000000000101

And do 5 << 1, you get

00000000000000000000000000001010

Which is 10, or 5 * 2^1.

Conversely, the >> will divide by a power of 2 by moving the bits over to the right n places.

Usage of << for exponentiation in C or CUDA

Tags:

c

bit-manipulation

bitwise-operators

smilingbuddha

1 Answers

Seth Carnegie

Recent Activity

Donate For Us