Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

nvidia-smi GPU performance measure does not make sense

Tags:

I am using Nvidia GTX Titan X to do deep learning experiment. I am using nvidia-smi to monitor the GPU running state, but the perf(ormance) state the tool provided does not make sense.

I have check out the nvidia-smi manual, it said the following:

Performance State The current performance state for the GPU. States range from P0 (maximum performance) to P12 (minimum performance).

Without running any process on GPU(idle state),the GPU performance state is p0. However, when running some computation heavy process, the state became p2.

My question is, why my GPU is at P0 state at idle, but switch to P2 when running heavy computation task? Shouldn't it be the opposite?

Also, is there a way to make my GPU always run at P0 state(maximum performance)?

like image 289
jiajun Avatar asked Jun 05 '15 09:06

jiajun


People also ask

What does nvidia-SMI tell you?

NVIDIA-smi ships with NVIDIA GPU display drivers on Linux, and with 64bit Windows Server 2008 R2 and Windows 7. Nvidia-smi can report query information as XML or human readable plain text to either standard output or a file.

What is nvidia P0 state?

P-States are GPU active/executing performance capability and power consumption states. P-States range from P0 to P15, with P0 being the highest performance/power state, and P15 being the lowest performance/power state. Each P-State maps to a performance level.

What is MiB in nvidia-SMI?

Take this screenshot as an example: image881×418 16.2 KB. Let the top-middle value, which in the format {Used} / Total MiB , be called Overall Memory Usage . Let the bottom-right value, which is in the format {Used} MiB , be called GPU Memory Usage .

How do I check my GPU SMI nvidia?

Accessing nvidia-smi to review GPU Usage Type cd C:\Program Files\NVIDIA Corporation\NVSMI into the DOS window and press enter. Type nvidia-smi -l 10 in the DOS window and press enter. This will instruct nvidia-smi to refresh every 10 seconds. Review the nvidia-smi usage summary.


1 Answers

It is confusing.

The nvidia-smi manual is correct, however.

When a GPU or set of GPUs are idle, the process of running nvidia-smi on a machine will usually bring one of those GPUs out of the idle state. This is due to the information that the tool is collecting - it needs to wake up one of the GPUs.

This wake up process will initially bring the GPU to P0 state (highest perf state), but the GPU driver will monitor that GPU, and eventually start to reduce the performance state to save power, if the GPU is idle or not particularly busy.

On the other hand, when the GPUs are active with a workload, the GPU driver will, according to its own heuristic, continuously adjust the performance state to deliver best performance while matching the performance state to the actual workload. If no thermal or power limits are reached, the perf state should reach its highest level (P0) for the most active and heaviest, continuous workloads.

Workloads that are periodically heavy, but not continuous, may see the GPU power state fluctuate around levels P0-P2. GPUs that are "throttled" due to thermal (temperature) or power issues may also see reduced P-states. This type of throttling is evident and reported separately in nvidia-smi, but this type of reporting may not be enabled for all GPU types.

If you want to see the P0 state on your GPU, the best advice I can offer is to run a short, heavy, continuous workload (something that does a large sgemm operation, for example), and then monitor the GPU during that workload. It should be possible to see P0 state in that situation.

If you are using a machine learning application (e.g. Caffe) that is using the cuDNN library, and you are training a large network, it should be possible to see P0 from time to time, because cuDNN does operations that are something like sgemm in this scenario, typically.

But for a sporadic workload, it's quite possible that the most commonly observed state would be P2.

To "force" a P0 power state always, you can try experimenting with the persistence mode and applications clocks via the nvidia-smi tool. Use nvidia-smi --help or the man page for nvidia-smi to understand the options.

Although I don't think this will typically apply to Tesla GPUs, some NVIDIA GPUs may limit themselves to a P2 power state under compute load unless application clocks are specifically set higher. Use the nvidia-smi -a command to see the current Application clocks, the Default Application Clocks, and the Max Clocks available for your GPU. (Some GPUs, including older GPUs, may display N/A for some of these fields. That generally indicates the applications clocks are not modifiable via nvidia-smi.) If a card seems to run at the P2 state during compute load, you may be able to increase it to P0 state by increasing the application clocks to the maximum available (i.e. Max Clocks). Use nvidia-smi --help to learn how to format the command to change the application clocks on your GPU. Modifying application clocks, or enabling modifiable application clocks, may require root/admin privilege. It may also be desirable or necessary to set the GPU Persistence mode. This will prevent the driver from "unloading" during periods of GPU activity, which may cause the application clocks to be reset when the driver re-loads.

This default behavior, for the affected cards in this situation, of limiting to P2 under compute load, is by design of the GPU driver.

This somewhat related question/answer may also be of interest.

like image 122
Robert Crovella Avatar answered Oct 13 '22 11:10

Robert Crovella