What does nvprof output: "No kernels were profiled" mean, and how to fix it

Tags:

cuda

I have recently installed Cuda on my arch-Linux machine through the system's package manager, and I have been trying to test whether or not it is working by running a simple vector addition program.

I simply copy-paste the code from this tutorial (Both the one using one and more kernels) into a file titled cuda_test.cu and run

> nvcc cuda_test.cu -o cuda_test

In either case, the program can run, and I get no errors (both as in the program doesn't crash and the output is that there were no errors). But when I try to run the Cuda profiler on the program:

> sudo nvprof ./cuda_test

I get result:

==3201== NVPROF is profiling process 3201, command: ./cuda_test
Max error: 0
==3201== Profiling application: ./cuda_test
==3201== Profiling result:
No kernels were profiled.
No API activities were profiled.
==3201== Warning: Some profiling data are not recorded. Make sure cudaProfilerStop() or cuProfilerStop() is called before application exit to flush profile data.

The latter warning is not my main problem or the topic of my question, my problem is the message saying that No Kernels were profiled and no API activities were profiled.

Does this mean that the program was run entirely on my CPU? or is it an error in nvprof?

I have found a discussion about the same error here, but there the answer was that the wrong version of Cuda was installed, and in my case, the version installed is the latest version installed through the systems package manager (Version 10.1.243-1)

Is there any way I can get either nvprof to display the expected output?

Edit

Trying to adhere to the warning at the end does not solve the problem:

Adding call to cudaProfilerStop() (or cuProfilerStop()), and also adding cudaDeviceReset(); at end as suggested and linking the appropriate library (cuda_profiler_api.h or cudaProfiler.h) and compiling with

> nvcc cuda_test.cu -o cuda_test -lcuda

Yields a program which can still run, but which, when uppon which nvprof is run, returns:

==12558== NVPROF is profiling process 12558, command: ./cuda_test
Max error: 0
==12558== Profiling application: ./cuda_test
==12558== Profiling result:
No kernels were profiled.
No API activities were profiled.
==12558== Warning: Some profiling data are not recorded. Make sure cudaProfilerStop() or cuProfilerStop() is called before application exit to flush profile data.
======== Error: Application received signal 139

This has not solved the original problem, and has in fact created a new error; the same happens when cudaProfilerStop() is used on its own or alongside cuProfilerStop() and cudaDeviceReset();

The code

The code is, as mentioned copied from a tutorial to test if Cuda is working, though I also have included calls to cudaProfilerStop() and cudaDeviceReset(); for clarity, it is here included:

#include <iostream>

#include <math.h>

#include <cuda_profiler_api.h>

// Kernel function to add the elements of two arrays

__global__
void add(int n, float *x, float *y)
{
  int index = threadIdx.x;
  int stride = blockDim.x;
  for (int i = index; i < n; i += stride)
      y[i] = x[i] + y[i];
}


int main(void)

{

  int N = 1<<20;

  float *x, *y;


  cudaProfilerStart();


  // Allocate Unified Memory – accessible from CPU or GPU

  cudaMallocManaged(&x, N*sizeof(float));

  cudaMallocManaged(&y, N*sizeof(float));



  // initialize x and y arrays on the host

  for (int i = 0; i < N; i++) {

    x[i] = 1.0f;

    y[i] = 2.0f;

  }



  // Run kernel on 1M elements on the GPU

    add<<<1, 1>>>(N, x, y);



  // Wait for GPU to finish before accessing on host

  cudaDeviceSynchronize();



  // Check for errors (all values should be 3.0f)

  float maxError = 0.0f;

  for (int i = 0; i < N; i++)

    maxError = fmax(maxError, fabs(y[i]-3.0f));

  std::cout << "Max error: " << maxError << std::endl;



  // Free memory

  cudaFree(x);

  cudaFree(y);
  
  cudaDeviceReset();
  cudaProfilerStop();

  

  return 0;

}

343

asked Aug 22 '19 14:08

Nikolaj

1 Answers

This problem was apparently somewhat well known, after some searching I found this thread about the error-code in the edited version; the solution as discussed there is to call nvprof with the flag --unified-memory-profiling off:

> sudo nvprof --unified-memory-profiling off ./cuda_test

This makes nvprof work as expected-- even without the call to cudaProfileStop.

117

answered Nov 23 '22 17:11

Nikolaj

Related questions
                            
                                Failed to compile cuda_ndarray.cu: libcublas.so.7.5: cannot open shared object file
                            
                                CUDA compiling error after installing it
                            
                                Vectorizing for cuda, a function that takes a complex number as input and a complex number as output fails in numba [closed]
                            
                                CUDA: cudaEvent_t and cudaThreadSynchronize usage
                            
                                Units of cuda registers
                            
                                What algorithm does OpenCV's Bayer conversion use?
                            
                                Meaning of following syntax of cuda Kernel
                            
                                Beginner CUDA - Simple var increment not working
                            
                                Half precision floating points in CUDA
                            
                                CUDA synchronization kernels
                            
                                libNVVM cannot be found
                            
                                CUDA kernel doesn't launch
                            
                                CUDA substituting __syncthreads instead of __threadfence() difference
                            
                                How to generate random permutations with CUDA
                            
                                colored image to greyscale image using CUDA parallel processing
                            
                                Improving kernel performance by increasing occupancy?
                            
                                When is padding for shared memory really required?
                            
                                Dynamic Parallelism - undefined reference to __cudaRegisterLinkedBinary linking error while compiling - separate compilation
                            
                                Learn Nvidia CUDA
                            
                                Generating random numbers with uniform distribution using Thrust

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

What does nvprof output: "No kernels were profiled" mean, and how to fix it

Tags:

cuda

Edit

The code

Nikolaj

People also ask

1 Answers

Nikolaj

Recent Activity

Donate For Us