Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

cudaDeviceSynchronize() waits to finish only in current CUDA context or in all contexts?

I use CUDA 6.5 and 4 x GPUs Kepler.

I use multithreading, CUDA runtime API and access to the CUDA contexts from different CPU threads (by using OpenMP - but it does not really matter).

  1. When I call cudaDeviceSynchronize(); will it wait for kernel(s) to finish only in current CUDA context which selected by the latest call cudaSetDevice(), or in all CUDA contexts?

  2. If it will wait for kernel(s) to finish in all CUDA contexts, then it will wait in all CUDA contexts which used in current CPU thread (in example CPU thread_0 will wait GPUs: 0 and 1) or generally all CUDA contexts (CPU thread_0 will wait GPUs: 0, 1, 2 and 3)?

Following code:

// For using OpenMP requires to set:
// MSVS option: -Xcompiler "/openmp"
// GCC option: –Xcompiler –fopenmp
#include <omp.h>

int main() {

    // execute two threads with different: omp_get_thread_num() = 0 and 1
    #pragma omp parallel num_threads(2)
    {
        int omp_threadId = omp_get_thread_num();

        // CPU thread 0
        if(omp_threadId == 0) {

            cudaSetDevice(0);
            kernel_0<<<...>>>(...);
            cudaSetDevice(1);
            kernel_1<<<...>>>(...);

            cudaDeviceSynchronize(); // what kernel<>() will wait?

        // CPU thread 1
        } else if(omp_threadId == 1) {

            cudaSetDevice(2);
            kernel_2<<<...>>>(...);
            cudaSetDevice(3);
            kernel_3<<<...>>>(...);

            cudaDeviceSynchronize(); // what kernel<>() will wait?

        }
    }

    return 0;
}
like image 995
Alex Avatar asked Nov 10 '14 10:11

Alex


People also ask

What is context in CUDA?

Most CUDA functions require a context. A CUDA context is analogous to a CPU process - it's an isolated container for all runtime state, including configuration settings and the device/unified/page-locked memory allocations. Each context has a separate memory space, and pointers from one context do not work in another.

Which is the correct way to launch a CUDA kernel?

In order to launch a CUDA kernel we need to specify the block dimension and the grid dimension from the host code. I'll consider the same Hello World! code considered in the previous article. In the above code, to launch the CUDA kernel two 1's are initialised between the angle brackets.

What is function of __ global __ qualifier in CUDA program?

__global__ : 1. A qualifier added to standard C. This alerts the compiler that a function should be compiled to run on a device (GPU) instead of host (CPU).

What are the three general section of CUDA program?

There are three key language extensions CUDA programmers can use—CUDA blocks, shared memory, and synchronization barriers. CUDA blocks contain a collection of threads. A block of threads can share memory, and multiple threads can pause until all threads reach a specified set of execution.


1 Answers

When I call cudaDeviceSynchronize(); will it wait for kernel(s) to finish only in current CUDA context which selected by the latest call cudaSetDevice(), or in all CUDA contexts?

cudaDeviceSynchronize() syncs all streams in the current CUDA context only.


Note: cudaDeviceSynchronize() will only synchronize host with the currently set GPU, if multiple GPUs are in use and all need to be synchronized, cudaDeviceSynchronize() has to be called separately for each one.

Here is a minimal example:

cudaSetDevice(0); cudaDeviceSynchronize();
cudaSetDevice(1); cudaDeviceSynchronize();
...

Source: Pawel Pomorski, slides of "CUDA on multiple GPUs". Linked here.

like image 160
srodrb Avatar answered Oct 14 '22 12:10

srodrb