Is it possible to remotely execute a CUDA profile execution (similar to computeprof) and then bring the profile back for analysis?
The particular remote machine is headless and not-under-my-control, so no X, no Qt libraries, etc.
Profiling Overview The Visual Profiler is a graphical profiling tool that displays a timeline of your application's CPU and GPU activity, and that includes an automated analysis engine to identify optimization opportunities. The nvprof profiling tool enables you to collect and view profiling data from the command-line.
The nvprof profiling tool collects and views profiling data from the command-line. It enables the collection of a timeline of CUDA-related activities on both CPU and GPU , including kernel execution, memory transfers, memory set and CUDA API calls and events or metrics for CUDA kernels.
The NVIDIA Visual Profiler is a cross-platform performance profiling tool that delivers developers vital feedback for optimizing CUDA C/C++ applications. First introduced in 2008, Visual Profiler supports all 350 million+ CUDA capable NVIDIA GPUs shipped since 2006 on Linux, Mac OS X, and Windows.
Yes you can. The CUDA driver has built-in profiling facilities. How to do it is discussed in the Compute_Profiler.txt file you will find in the doc directory of the toolkit, but the basic idea is something like this:
$ COMPUTE_PROFILE=1 COMPUTE_PROFILE_CSV=1 COMPUTE_PROFILE_LOG=log.csv COMPUTE_PROFILE_CONFIG=config.txt ./app
which tells the runtime to turn on profiling, use csv format output written to log.csv, including the profile statistics read from config.txt. After the app has run, the runtime will drop an output file with the raw profiling results in them. You can then use the tool of your choice to look at them. The visual profiler can be convinced open to the output, but a lot of the fancy synchronization it does requires the output to be generated using its own profile configuration files (under the hood it is dynamically doing the same thing you do manually, but on the fly). I have done some digging around and scraped copies of the configuration files so I could regenerate specific application profiling runs without the profiler on headless cluster nodes. Not too much fun, but it can be done.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With