Is this even possible?
INSTRUCTIONS: - Right click on your desktop, and then select Nvidia Control Panel. Then in the tab menu's, go to Manage Settings. Then set the Power usage from Adaptive, to Prefer Maximum Performance, and switch the rest of the options accordingly to what renders more performance.
Your computer is using integrated graphics The graphics card isn't doing anything when the computer runs on the integrated GPU. You can still see it in the Task Manager and performance tracking programs. If this happens, you'll see 0-1% GPU usage on the graphs.
If your GPU is underperforming and you see low usage, there might be an issue with your build. Your video card drivers and operating system are out-of-date. If you don't keep your operating system updated, check to ensure none of the ignored updates are related to power regulation or GPUs.
Not really, but you can get differente performance counters using your vendor's utilities, for NVIDIA you have NVPerfKit and NVPerfHUD. Other vendors have similar utilities.
Nope. It's even hard to rigorously define in such a highly parallel environment. However you can approximate it with ARB_timer_query extension.
I have implemented a timer query based GPU execution time measurement framework in my OpenGL rendering thread implementation. I'll share the timer query parts below:
Assume
enqueue
runs a function on the rendering threadlimiter.frame60
is only equal to 0 once every 60 framesCode:
struct TimerQuery
{
std::string description;
GLuint timer;
};
typedef std::deque<TimerQuery> TimerQueryQueue;
...
TimerQueryQueue timerQueryQueue;
...
void GlfwThread::beginTimerQuery(std::string description)
{
if (limiter.frame60 != 0)
return;
enqueue([this](std::string const& description) {
GLuint id;
glGenQueries(1, &id);
timerQueryQueue.push_back({ description, id });
glBeginQuery(GL_TIME_ELAPSED, id);
}, std::move(description));
}
void GlfwThread::endTimerQuery()
{
if (limiter.frame60 != 0)
return;
enqueue([this]{
glEndQuery(GL_TIME_ELAPSED);
});
}
void GlfwThread::dumpTimerQueries()
{
while (!timerQueryQueue.empty())
{
TimerQuery& next = timerQueryQueue.front();
int isAvailable = GL_FALSE;
glGetQueryObjectiv(next.timer,
GL_QUERY_RESULT_AVAILABLE,
&isAvailable);
if (!isAvailable)
return;
GLuint64 ns;
glGetQueryObjectui64v(next.timer, GL_QUERY_RESULT, &ns);
DebugMessage("timer: ",
next.description, " ",
std::fixed,
std::setprecision(3), std::setw(8),
ns / 1000.0, Stopwatch::microsecText);
glDeleteQueries(1, &next.timer);
timerQueryQueue.pop_front();
}
}
Here is some example output:
Framerate t=5.14 fps=59.94 fps_err=-0.00 aet=2850.67μs adt=13832.33μs alt=0.00μs cpu_usage=17%
instanceCount=20301 parallel_μs=2809
timer: text upload range 0.000μs
timer: clear and bind 95.200μs
timer: upload 1.056μs
timer: draw setup 1.056μs
timer: draw 281.568μs
timer: draw cleanup 1.024μs
timer: renderGlyphs 1.056μs
Framerate t=6.14 fps=59.94 fps_err=0.00 aet=2984.55μs adt=13698.45μs alt=0.00μs cpu_usage=17%
instanceCount=20361 parallel_μs=2731
timer: text upload range 0.000μs
timer: clear and bind 95.232μs
timer: upload 1.056μs
timer: draw setup 1.024μs
timer: draw 277.536μs
timer: draw cleanup 1.056μs
timer: renderGlyphs 1.024μs
Framerate t=7.14 fps=59.94 fps_err=-0.00 aet=3007.05μs adt=13675.95μs alt=0.00μs cpu_usage=18%
instanceCount=20421 parallel_μs=2800
timer: text upload range 0.000μs
timer: clear and bind 95.232μs
timer: upload 1.056μs
timer: draw setup 1.056μs
timer: draw 281.632μs
timer: draw cleanup 1.024μs
timer: renderGlyphs 1.056μs
This allows me to call renderThread->beginTimerQuery("draw some text");
before my opengl draw calls or whatever, and renderThread->endTimerQuery();
right after it, to measure the elapsed GPU execution time.
The idea here is, it issues a command to the GPU command queue right before the measured section, so glBeginQuery
TIME_ELAPSED
records the value of some implementation defined counter. The glEndQuery
issues a GPU command to store the difference between the current count and the one stored at the beginning of the TIME_ELAPSED
query. That result is stored by the GPU in the query object and is "available" at some asynchronous future time. My code keeps a queue of issued timer queries and checks once per second for finished measurements. My dumpTimerQueue
keeps printing the measurements as long as the timer query at the head of the queue is still available. Eventually it hits a timer that is not available yet and stops printing messages.
I added an additional feature that it drops 59 out of 60 calls to the measurement functions, so it only measures once per second for all the instrumentation in my program. This prevents too much spam and makes it usable to dump to stdout for development, and prevents too much performance interference caused by the measurements. That is what the limiter.frame60 thing is, frame60 is guaranteed to be < 60. It wraps.
While this doesn't perfectly answer the question, you can infer the GPU usage by noting the elapsed time for all of the draw calls vs the elapsed wall clock time. If the frame was 16ms and the timer query TIME_ELAPSED was 8ms, you can infer approximately 50% GPU usage.
One more note: the measurement is measured GPU execution time, by putting GPU commands in the GPU queue. The threading has nothing to do with it, if the operations inside those enqueue
were executed in one thread it would be equivalent.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With