Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Remote CUDA profiling?

Is it possible to remotely execute a CUDA profile execution (similar to computeprof) and then bring the profile back for analysis?

The particular remote machine is headless and not-under-my-control, so no X, no Qt libraries, etc.

like image 525
Bolster Avatar asked May 05 '11 18:05

Bolster


People also ask

What is profiling in CUDA?

Profiling Overview The Visual Profiler is a graphical profiling tool that displays a timeline of your application's CPU and GPU activity, and that includes an automated analysis engine to identify optimization opportunities. The nvprof profiling tool enables you to collect and view profiling data from the command-line.

How does nvprof work?

The nvprof profiling tool collects and views profiling data from the command-line. It enables the collection of a timeline of CUDA-related activities on both CPU and GPU , including kernel execution, memory transfers, memory set and CUDA API calls and events or metrics for CUDA kernels.

What is NVIDIA Profiler?

The NVIDIA Visual Profiler is a cross-platform performance profiling tool that delivers developers vital feedback for optimizing CUDA C/C++ applications. First introduced in 2008, Visual Profiler supports all 350 million+ CUDA capable NVIDIA GPUs shipped since 2006 on Linux, Mac OS X, and Windows.


1 Answers

Yes you can. The CUDA driver has built-in profiling facilities. How to do it is discussed in the Compute_Profiler.txt file you will find in the doc directory of the toolkit, but the basic idea is something like this:

$ COMPUTE_PROFILE=1 COMPUTE_PROFILE_CSV=1 COMPUTE_PROFILE_LOG=log.csv COMPUTE_PROFILE_CONFIG=config.txt ./app

which tells the runtime to turn on profiling, use csv format output written to log.csv, including the profile statistics read from config.txt. After the app has run, the runtime will drop an output file with the raw profiling results in them. You can then use the tool of your choice to look at them. The visual profiler can be convinced open to the output, but a lot of the fancy synchronization it does requires the output to be generated using its own profile configuration files (under the hood it is dynamically doing the same thing you do manually, but on the fly). I have done some digging around and scraped copies of the configuration files so I could regenerate specific application profiling runs without the profiler on headless cluster nodes. Not too much fun, but it can be done.

like image 140
talonmies Avatar answered Sep 20 '22 02:09

talonmies