The target construct offloads a code region from host to a target device.
The variables p
,v1
,v2
, are explicitly mapped to the the target device using the map clause.
target data also does the same ,
Then what is implied by :
in regard to the "target data" construct,
I mean what differences are there in offloading mechanism amongst these codes :
void vec_mult1(float *p, float *v1, float *v2, int N)
{
int i;
init(v1, v2, N);
#pragma omp target map(to: v1[0:N], v2[:N]) map(from: p[0:N])
#pragma omp parallel for
for (i=0; i<N; i++)
p[i] = v1[i] * v2[i];
output(p, N);
}
void vec_mult2(float *p, float *v1, float *v2, int N)
{
int i;
init(v1, v2, N);
#pragma omp target device(mic0) data map(to: v1[0:N], v2[:N]) map(from: p[0:N])
{
//this code runs on accelerator card
#pragma omp target //if we omit it what difference will it make ?
#pragma omp parallel for
for (i=0; i<N; i++)
p[i] = v1[i] * v2[i];
}
output(p, N);
}
void vec_mult3(float *p, float *v1, float *v2, int N)
{
int i;
init(v1, v2, N);
#pragma omp target data map(to: v1[0:N], v2[:N]) map(from: p[0:N])
{
//target construct omitted
#pragma omp parallel for
for (i=0; i<N; i++)
p[i] = v1[i] * v2[i];
}
output(p, N);
}
I tried to execute them but I'm unable to notice significant differences between them.
The target data
construct only creates a device data environment that lasts for the extent of the region. It only sets the mapping between the variables in the device data environment and the data environment of the encountering task. The rationale behind having a separate construct is that in many cases it is desirable that certain data remains on the device instead of constantly being transferred to and from it.
Imagine the following very artificial example:
int data[N];
#pragma omp target
#pragma omp for
for (int i = 0; i < N; i++)
data[i] *= 2;
// Do something else
#pragma omp target
#pragma omp for
for (int i = 0; i < N; i++)
data[i] += 5;
Now in that case, the two target
constructs also create two data environments. The data
variable is automatically mapped as tofrom
. That means the following set of actions takes place:
data
is copied to the devicedata
is copied from the device// Do something else
data
is copied to the devicedata
is copied from the deviceNow imagine that // Do something else
reads data
but never modifies it. That makes the transfer of data
to the device in step 5 redundant - it could just be retained in the state it has after step 2. Here is where the target data
construct comes into play. It allows you to create a data environment that spans more than the extent of the target
construct. The example above can then be rewritten so:
int data[N];
#pragma omp target data map(tofrom: data)
{
#pragma omp target
#pragma omp for
for (int i = 0; i < N; i++)
data[i] *= 2;
#pragma omp target update from(data)
// Do something else
#pragma omp target
#pragma omp for
for (int i = 0; i < N; i++)
data[i] += 5;
}
The target
constructs in that case do not create new device data environments but rather utilise the one created by the target data
construct (actually they do create new data environments, but those are merged with the one from the target data
and they do not contain any new references). So the sequence of operations is:
data
is copied to the devicedata
is explicitly copied from the device// Do something else
data
is copied from the deviceSince data
is needed in // Do something else
but it is only automatically transferred from the device at the end of the target data
construct, the explicit target update
is used to copy it into the encountering task's data environment at step 3.
Now this is just a small and very artificial example but in real life saving on unnecessary data transfers could significantly improve the performance of OpenMP applications that offload computations to co-processors and/or accelerators.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With