How many grids in CUDA

Question

How many CUDA grids are possible in a GPU?

Can two grids exist together in a GPU? Or does one GPU device have only one grid?

Kernel1<<gridDim, blockDim>>(dst1, param1);
Kernel1<<gridDim, blockDim>>(dst2, param2);

Do two kernels above run concurrently or sequentially?

Robert Crovella · Accepted Answer

If the two kernels are issued as indicated above, they will be serialized (they will run sequentially). This is because without any other code (i.e. to switch streams) the two kernels will be issued to the same cuda stream. All cuda calls issued to the same stream are executed sequentially, even if you think you should see otherwise because you're using cudaMemcpyAsync or something like that.

It's certainly possible to have multiple kernels running asynchronously with respect to each other (so possibly concurrently) but it's necessary to use the cuda streams API to accomplish this.

You may want to look at Section 3.2.5 "Asynchronous Concurrent Execution" in the CUDA C Programmers Guide to learn more about streams and concurrent kernel execution. In addition, there are a number of samples in the nvidia CUDA SDK such as simple streams which will illustrate the concepts. The concurrent kernels sample shows how to run multiple kernels concurrently (using streams). Note that running kernels concurrently requires compute capability 2.0 or "higher" hardware.

Also, to answer your first question, from section 3.2.5.2 of the CUDA C Programming guide, "The maximum number of kernel launches that a device can execute concurrently varies by device but may be as high as 128 for some devices"

For reference, a "grid" is the entire thread array associated with a single kernel launch.

solvingPuzzles · Answer

To elaborate on Robert's answer, here's an example of how you could use streams to make your two instances of Kernel1 run concurrently:

cudaStream_t stream1; cudaStreamCreate(&stream1);
cudaStream_t stream2; cudaStreamCreate(&stream2);

Kernel1<<gridDim, blockDim, 0, stream1>>(dst1, param1);
Kernel1<<gridDim, blockDim, 0, stream2>>(dst2, param2);

A few more notes about concurrent execution with streams:

If we launch a kernel without specifying a stream Kernel1<<<g, b>>>(), and then launch a kernel with a specific stream Kernel2<<<g, b, 0, stream>>>(), then Kernel2 will wait for Kernel1 to finish.
When a kernel is launched without a stream (Kernel1<<<g, b>>>()), Nvidia calls this "using the NULL stream."
If you use cudaEvents, your work can sometimes get serialized even if you distribute the kernels over several streams.

How many grids in CUDA

Tags:

parallel-processing

cuda

gpu

nvidia

Cholgyun Ri

2 Answers

Robert Crovella

solvingPuzzles

Recent Activity

Donate For Us

How many grids in CUDA

Tags:

parallel-processing

cuda

gpu

nvidia

Cholgyun Ri

2 Answers

Robert Crovella

solvingPuzzles

Related questions

Recent Activity

Donate For Us