Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Difference between openMP's target and target data?

The target construct offloads a code region from host to a target device. The variables p,v1,v2, are explicitly mapped to the the target device using the map clause. target data also does the same ,

Then what is implied by :

  • "The construct creates variables that will persist throughout the target data region "
  • "New device data environment creation"

in regard to the "target data" construct,

I mean what differences are there in offloading mechanism amongst these codes :

void vec_mult1(float *p, float *v1, float *v2, int N)
{
    int i;
    init(v1, v2, N);
#pragma omp target map(to: v1[0:N], v2[:N]) map(from: p[0:N])
#pragma omp parallel for
    for (i=0; i<N; i++)
        p[i] = v1[i] * v2[i];
    output(p, N);
}


void vec_mult2(float *p, float *v1, float *v2, int N)
{
    int i;
    init(v1, v2, N);
#pragma omp target device(mic0) data map(to: v1[0:N], v2[:N]) map(from: p[0:N])
    {
    //this code runs on accelerator card
#pragma omp target //if we omit it what difference will it make ? 
#pragma omp parallel for
        for (i=0; i<N; i++)
            p[i] = v1[i] * v2[i];
    }
    output(p, N);
}

void vec_mult3(float *p, float *v1, float *v2, int N)
{
    int i;
    init(v1, v2, N);
#pragma omp target data map(to: v1[0:N], v2[:N]) map(from: p[0:N])
    {

        //target construct omitted
#pragma omp parallel for
        for (i=0; i<N; i++)
            p[i] = v1[i] * v2[i];
    }
    output(p, N);
}

I tried to execute them but I'm unable to notice significant differences between them.

like image 642
puneet336 Avatar asked Mar 07 '14 10:03

puneet336


1 Answers

The target data construct only creates a device data environment that lasts for the extent of the region. It only sets the mapping between the variables in the device data environment and the data environment of the encountering task. The rationale behind having a separate construct is that in many cases it is desirable that certain data remains on the device instead of constantly being transferred to and from it.

Imagine the following very artificial example:

int data[N];

#pragma omp target
#pragma omp for
for (int i = 0; i < N; i++)
   data[i] *= 2;

// Do something else

#pragma omp target
#pragma omp for
for (int i = 0; i < N; i++)
   data[i] += 5;

Now in that case, the two target constructs also create two data environments. The data variable is automatically mapped as tofrom. That means the following set of actions takes place:

  1. data is copied to the device
  2. The first loop runs on the device
  3. data is copied from the device
  4. The host executes // Do something else
  5. data is copied to the device
  6. The second loop runs on the device
  7. data is copied from the device

Now imagine that // Do something else reads data but never modifies it. That makes the transfer of data to the device in step 5 redundant - it could just be retained in the state it has after step 2. Here is where the target data construct comes into play. It allows you to create a data environment that spans more than the extent of the target construct. The example above can then be rewritten so:

int data[N];

#pragma omp target data map(tofrom: data)
{
   #pragma omp target
   #pragma omp for
   for (int i = 0; i < N; i++)
      data[i] *= 2;

   #pragma omp target update from(data)

   // Do something else

   #pragma omp target
   #pragma omp for
   for (int i = 0; i < N; i++)
      data[i] += 5;
}

The target constructs in that case do not create new device data environments but rather utilise the one created by the target data construct (actually they do create new data environments, but those are merged with the one from the target data and they do not contain any new references). So the sequence of operations is:

  1. data is copied to the device
  2. The first loop runs on the device
  3. data is explicitly copied from the device
  4. The host executes // Do something else
  5. The second loop runs on the device
  6. data is copied from the device

Since data is needed in // Do something else but it is only automatically transferred from the device at the end of the target data construct, the explicit target update is used to copy it into the encountering task's data environment at step 3.

Now this is just a small and very artificial example but in real life saving on unnecessary data transfers could significantly improve the performance of OpenMP applications that offload computations to co-processors and/or accelerators.

like image 98
Hristo Iliev Avatar answered Oct 18 '22 05:10

Hristo Iliev