cudaDeviceReset for multiple gpu's

Question

I am currently working on a gpu server which has 4 Tesla T10 gpu's. While I keep testing the kernels and have to frequently kill the processes using ctrl-C, I added a few lines to the end of a simple device query code. The code is given below :

#include <stdio.h>

 // Print device properties
 void printDevProp(cudaDeviceProp devProp)
{
    printf("Major revision number:         %d
",  devProp.major);
    printf("Minor revision number:         %d
",  devProp.minor);
    printf("Name:                          %s
",  devProp.name);
    printf("Total global memory:           %u
",  devProp.totalGlobalMem);
    printf("Total shared memory per block: %u
",  devProp.sharedMemPerBlock);
    printf("Total registers per block:     %d
",  devProp.regsPerBlock);
    printf("Warp size:                     %d
",  devProp.warpSize);
    printf("Maximum memory pitch:          %u
",  devProp.memPitch);
    printf("Maximum threads per block:     %d
",  devProp.maxThreadsPerBlock);
    for (int i = 0; i < 3; ++i)
    printf("Maximum dimension %d of block:  %d
", i, devProp.maxThreadsDim[i]);
    for (int i = 0; i < 3; ++i)
    printf("Maximum dimension %d of grid:   %d
", i, devProp.maxGridSize[i]);
    printf("Clock rate:                    %d
",  devProp.clockRate);
    printf("Total constant memory:         %u
",  devProp.totalConstMem);
    printf("Texture alignment:             %u
",  devProp.textureAlignment);
    printf("Concurrent copy and execution: %s
",  (devProp.deviceOverlap ? "Yes" : "No"));
    printf("Number of multiprocessors:     %d
",  devProp.multiProcessorCount);
    printf("Kernel execution timeout:      %s
",  (devProp.kernelExecTimeoutEnabled ? "Yes" : "No"));
    return;
}

 int main()
{
    // Number of CUDA devices
    int devCount;
    cudaGetDeviceCount(&devCount);
    printf("CUDA Device Query...
");
    printf("There are %d CUDA devices.
", devCount);

    // Iterate through devices
    for (int i = 0; i < devCount; ++i)
    {
        // Get device properties
        printf("
CUDA Device #%d
", i);
        cudaDeviceProp devProp;
        cudaGetDeviceProperties(&devProp, i);
        printDevProp(devProp);
    }

    printf("
Press any key to exit...");
    char c;
    scanf("%c", &c);

    **for (int i = 0; i < devCount; i++) {
        cudaSetDevice(i);
        cudaDeviceReset();
    }**

    return 0;
}

My query is related to the for loop just before the main() ends in which I set each device one by one and then use cudaResetDevice command. I get a strange feeling that this code, although doesnt produce any error but I am not able to reset all the devices. Instead, the program is resetting only the default device i.e device 0 each time. Can anyone tell me what should I do to reset each of the 4 devices.

Thanks

FizxMike · Accepted Answer

It looks like you can add a function to your GPU programs to catch the ctrl+c signal (SIGINT) and call the cudaDeviceReset() function for each device that was used by the program.

The example code to call a function when SIGINT is caught can be found here:

https://stackoverflow.com/a/482725

It seems like a good practice to include code like this for every GPU program you write, and I will do the same :-)

I don't have time to write up a full detailed answer, so read the other answer and it's comments also.

Alberto · Answer

This is probably too late but if you write a signal-handler function you can get rid of the memory leaks and reset the device in a sure way:

// State variables for 
extern int no_sigint;
int no_sigint = 1;
extern int interrupts;
int interrupts = 0;

/* Catches signal interrupts from Ctrl+c.
   If 1 signal is detected the simulation finishes the current frame and
   exits in a clean state. If Ctrl+c is pressed again it terminates the
   application without completing writes to files or calculations but
   deallocates all memory anyway. */
void
sigint_handler (int sig)
{
  if (sig == SIGINT)
    {
      interrupts += 1;
      std::cout << std::endl
                << "Aborting loop.. finishing frame."
                << std::endl;

      no_sigint = 0;

      if (interrupts >= 2)
        {
          std::cerr << std::endl
                    << "Multiple Interrupts issued: "
                    << "Clearing memory and Forcing immediate shutdown!"
                    << std::endl;

          // write a function to free dynamycally allocated memory
          free_mem ();

          int devCount;
          cudaGetDeviceCount (&devCount);

          for (int i = 0; i < devCount; ++i)
            {
              cudaSetDevice (i);
              cudaDeviceReset ();
            }
          exit (9);
        }
    }
}

....

int main(){ 
.....
for (int simulation_step=1 ; simulation_step < SIM_STEPS && no_sigint; ++simulation_step)
{
   .... simulation code
}
free_mem();
... cuda device resets
return 0;
}

If you use this code (you can even include the first snippet in an external header, it works. You can have 2 levels of control of ctrl+c: the first press stops your simulation and exits normally but the application finishes rendering the step which is great to stop gracefully and have correct results, if you press ctrl+c again it closes the application freeing all memory.

talonmies · Answer

cudaDeviceReset is intended for destroying resources associated with a given GPU context within the process in which it is run. One CUDA process can't reset or otherwise effect the context of another process. So when your modified device query calls cudaDeviceReset, it is only releases resources that it allocated, not those in use by any other process.

cudaDeviceReset for multiple gpu's

Tags:

cuda

Abhinav

3 Answers

FizxMike

Alberto

talonmies

Recent Activity

Donate For Us

cudaDeviceReset for multiple gpu's

Tags:

cuda

Abhinav

3 Answers

FizxMike

Alberto

talonmies

Related questions

Recent Activity

Donate For Us