Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

grid_group not found in CUDA 9

Tags:

cuda

I tried using Cooperative Groups in CUDA 9, but I get an error in compiling.
Does anyone know the solution?

The development environment is as follows:

  • CUDA 9
  • Kepler K80
  • Compute Capability: 3.7
#include <cstdint>
#include <iostream>
#include <vector>

#include <cooperative_groups.h>

__global__
void kernel(uint32_t values[])
{
    using namespace cooperative_groups;

    grid_group g = this_grid();
}

int main(void)
{
    constexpr uint32_t kNum = 1 << 24; 
    std::vector<uint32_t> h_values(kNum);
    uint32_t *d_values;

    cudaMalloc(&d_values, sizeof(uint32_t) * kNum);
    cudaMemcpy(d_values, h_values.data(), sizeof(uint32_t) * kNum, cudaMemcpyHostToDevice);

    const uint32_t thread_num = 256;
    const dim3 block(thread_num);
    const dim3 grid((kNum + block.x - 1) / block.x);
    void *params[] = {&d_values};

    cudaLaunchCooperativeKernel((void *)kernel, grid, block, params);

    cudaMemcpy(h_values.data(), d_values, sizeof(uint32_t) * kNum, cudaMemcpyDeviceToHost);

    cudaFree(d_values);

    return 0;
}
$ nvcc -arch=sm_37 test.cu --std=c++11 -o test
test.cu(12): error: identifier "grid_group" is undefined
test.cu(12): error: identifier "this_grid" is undefined
like image 995
kotatsu Avatar asked Jan 25 '18 02:01

kotatsu


1 Answers

The grid_group features are only supported in the Pascal architecture and later. You can try by compiling for, e.g., sm_60 (of course the executable won't run on your GPU). Additionally you need to enable relocatable device code (-rdc=true).

Unfortunately, the Programming Guide is not very clear about that. I couldn't find this information there. However it is mentioned in some posts on devblog.nvidia.com:

From https://devblogs.nvidia.com/cuda-9-features-revealed/

While Cooperative Groups works on all GPU architectures, certain functionality is inevitably architecture-dependent as GPU capabilities have evolved. Basic functionality, such as synchronizing groups smaller than a thread block down to warp granularity, is supported on all architectures, while Pascal and Volta GPUs enable new grid-wide and multi-GPU synchronizing groups.

Or at the very end of https://devblogs.nvidia.com/cooperative-groups/

New features in Pascal and Volta GPUs help Cooperative Groups go farther, by enabling creation and synchronization of thread groups that span an entire kernel launch running on one or even multiple GPUs.

like image 122
havogt Avatar answered Sep 21 '22 20:09

havogt