I tried using Cooperative Groups in CUDA 9, but I get an error in compiling.
Does anyone know the solution?
The development environment is as follows:
#include <cstdint>
#include <iostream>
#include <vector>
#include <cooperative_groups.h>
__global__
void kernel(uint32_t values[])
{
using namespace cooperative_groups;
grid_group g = this_grid();
}
int main(void)
{
constexpr uint32_t kNum = 1 << 24;
std::vector<uint32_t> h_values(kNum);
uint32_t *d_values;
cudaMalloc(&d_values, sizeof(uint32_t) * kNum);
cudaMemcpy(d_values, h_values.data(), sizeof(uint32_t) * kNum, cudaMemcpyHostToDevice);
const uint32_t thread_num = 256;
const dim3 block(thread_num);
const dim3 grid((kNum + block.x - 1) / block.x);
void *params[] = {&d_values};
cudaLaunchCooperativeKernel((void *)kernel, grid, block, params);
cudaMemcpy(h_values.data(), d_values, sizeof(uint32_t) * kNum, cudaMemcpyDeviceToHost);
cudaFree(d_values);
return 0;
}
$ nvcc -arch=sm_37 test.cu --std=c++11 -o test
test.cu(12): error: identifier "grid_group" is undefined
test.cu(12): error: identifier "this_grid" is undefined
The grid_group
features are only supported in the Pascal architecture and later.
You can try by compiling for, e.g., sm_60
(of course the executable won't run on your GPU). Additionally you need to enable relocatable device code (-rdc=true
).
Unfortunately, the Programming Guide is not very clear about that. I couldn't find this information there. However it is mentioned in some posts on devblog.nvidia.com:
From https://devblogs.nvidia.com/cuda-9-features-revealed/
While Cooperative Groups works on all GPU architectures, certain functionality is inevitably architecture-dependent as GPU capabilities have evolved. Basic functionality, such as synchronizing groups smaller than a thread block down to warp granularity, is supported on all architectures, while Pascal and Volta GPUs enable new grid-wide and multi-GPU synchronizing groups.
Or at the very end of https://devblogs.nvidia.com/cooperative-groups/
New features in Pascal and Volta GPUs help Cooperative Groups go farther, by enabling creation and synchronization of thread groups that span an entire kernel launch running on one or even multiple GPUs.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With