Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

OpenGL: How to get GPU usage percent?

Tags:

c++

opengl

Is this even possible?

like image 314
Newbie Avatar asked Sep 23 '10 12:09

Newbie


People also ask

How do I get 100% usage on my GPU?

INSTRUCTIONS: - Right click on your desktop, and then select Nvidia Control Panel. Then in the tab menu's, go to Manage Settings. Then set the Power usage from Adaptive, to Prefer Maximum Performance, and switch the rest of the options accordingly to what renders more performance.

Why is my GPU showing 0% load?

Your computer is using integrated graphics The graphics card isn't doing anything when the computer runs on the integrated GPU. You can still see it in the Task Manager and performance tracking programs. If this happens, you'll see 0-1% GPU usage on the graphs.

Why is my GPU not being used fully?

If your GPU is underperforming and you see low usage, there might be an issue with your build. Your video card drivers and operating system are out-of-date. If you don't keep your operating system updated, check to ensure none of the ignored updates are related to power regulation or GPUs.


3 Answers

Not really, but you can get differente performance counters using your vendor's utilities, for NVIDIA you have NVPerfKit and NVPerfHUD. Other vendors have similar utilities.

like image 176
Dr. Snoopy Avatar answered Sep 23 '22 06:09

Dr. Snoopy


Nope. It's even hard to rigorously define in such a highly parallel environment. However you can approximate it with ARB_timer_query extension.

like image 24
Yakov Galka Avatar answered Sep 21 '22 06:09

Yakov Galka


I have implemented a timer query based GPU execution time measurement framework in my OpenGL rendering thread implementation. I'll share the timer query parts below:

Assume

  • enqueue runs a function on the rendering thread
  • limiter.frame60 is only equal to 0 once every 60 frames

Code:

struct TimerQuery
{
    std::string description;
    GLuint timer;
};
typedef std::deque<TimerQuery> TimerQueryQueue;

...

TimerQueryQueue timerQueryQueue;

...

void GlfwThread::beginTimerQuery(std::string description)
{
    if (limiter.frame60 != 0)
        return;

    enqueue([this](std::string const& description) {
        GLuint id;
        glGenQueries(1, &id);
        timerQueryQueue.push_back({ description, id });
        glBeginQuery(GL_TIME_ELAPSED, id);
    }, std::move(description));
}

void GlfwThread::endTimerQuery()
{
    if (limiter.frame60 != 0)
        return;

    enqueue([this]{
        glEndQuery(GL_TIME_ELAPSED);
    });
}


void GlfwThread::dumpTimerQueries()
{
    while (!timerQueryQueue.empty())
    {
        TimerQuery& next = timerQueryQueue.front();

        int isAvailable = GL_FALSE;
        glGetQueryObjectiv(next.timer,
                           GL_QUERY_RESULT_AVAILABLE,
                           &isAvailable);
        if (!isAvailable)
            return;

        GLuint64 ns;
        glGetQueryObjectui64v(next.timer, GL_QUERY_RESULT, &ns);

        DebugMessage("timer: ",
                     next.description, " ",
                     std::fixed,
                     std::setprecision(3), std::setw(8),
                     ns / 1000.0, Stopwatch::microsecText);

        glDeleteQueries(1, &next.timer);

        timerQueryQueue.pop_front();
    }
}

Here is some example output:

Framerate t=5.14 fps=59.94 fps_err=-0.00 aet=2850.67μs adt=13832.33μs alt=0.00μs cpu_usage=17%
instanceCount=20301 parallel_μs=2809
timer: text upload range    0.000μs
timer: clear and bind   95.200μs
timer: upload    1.056μs
timer: draw setup    1.056μs
timer: draw  281.568μs
timer: draw cleanup    1.024μs
timer: renderGlyphs    1.056μs
Framerate t=6.14 fps=59.94 fps_err=0.00 aet=2984.55μs adt=13698.45μs alt=0.00μs cpu_usage=17%
instanceCount=20361 parallel_μs=2731
timer: text upload range    0.000μs
timer: clear and bind   95.232μs
timer: upload    1.056μs
timer: draw setup    1.024μs
timer: draw  277.536μs
timer: draw cleanup    1.056μs
timer: renderGlyphs    1.024μs
Framerate t=7.14 fps=59.94 fps_err=-0.00 aet=3007.05μs adt=13675.95μs alt=0.00μs cpu_usage=18%
instanceCount=20421 parallel_μs=2800
timer: text upload range    0.000μs
timer: clear and bind   95.232μs
timer: upload    1.056μs
timer: draw setup    1.056μs
timer: draw  281.632μs
timer: draw cleanup    1.024μs
timer: renderGlyphs    1.056μs

This allows me to call renderThread->beginTimerQuery("draw some text"); before my opengl draw calls or whatever, and renderThread->endTimerQuery(); right after it, to measure the elapsed GPU execution time.

The idea here is, it issues a command to the GPU command queue right before the measured section, so glBeginQuery TIME_ELAPSED records the value of some implementation defined counter. The glEndQuery issues a GPU command to store the difference between the current count and the one stored at the beginning of the TIME_ELAPSED query. That result is stored by the GPU in the query object and is "available" at some asynchronous future time. My code keeps a queue of issued timer queries and checks once per second for finished measurements. My dumpTimerQueue keeps printing the measurements as long as the timer query at the head of the queue is still available. Eventually it hits a timer that is not available yet and stops printing messages.

I added an additional feature that it drops 59 out of 60 calls to the measurement functions, so it only measures once per second for all the instrumentation in my program. This prevents too much spam and makes it usable to dump to stdout for development, and prevents too much performance interference caused by the measurements. That is what the limiter.frame60 thing is, frame60 is guaranteed to be < 60. It wraps.

While this doesn't perfectly answer the question, you can infer the GPU usage by noting the elapsed time for all of the draw calls vs the elapsed wall clock time. If the frame was 16ms and the timer query TIME_ELAPSED was 8ms, you can infer approximately 50% GPU usage.

One more note: the measurement is measured GPU execution time, by putting GPU commands in the GPU queue. The threading has nothing to do with it, if the operations inside those enqueue were executed in one thread it would be equivalent.

like image 42
doug65536 Avatar answered Sep 24 '22 06:09

doug65536