Here is a very simple test program. When vsync is disabled this program runs at 100FPS and uses up virtually 0% of the CPU. When I enable vsync, I get 60FPS and 25% (100% of one core on a 4 core system) CPU utilization. This is using a Nvidia GPU. Searching online lead me to the suggestion to disable "multithreaded optimization" inside of Nvidia control panel. This does decrease the CPU utilization, but only to 10%. Furthermore, if I remove the call to sleep after SwapBuffers, I get 25% utilization again even with multithreaded optimization disabled. Can anyone shed some light on this? Am I doing something wrong? Is Nvidia's OpenGL implementation just hopelessly flawed?
#include <GLFW/glfw3.h>
#include <thread>
#include <cstdlib>
#include <cstdio>
int main(int argc, char *argv[])
{
if(!glfwInit())
exit(EXIT_FAILURE);
glfwWindowHint(GLFW_RESIZABLE, GL_FALSE);
GLFWwindow* window = glfwCreateWindow(800, 600, "OpenGL Vsync Test", nullptr, nullptr);
if(!window)
{
glfwTerminate();
exit(EXIT_FAILURE);
}
glfwMakeContextCurrent(window);
#ifdef USE_VSYNC
glfwSwapInterval(1);
#else
glfwSwapInterval(0);
#endif
glClearColor(1.0f, 0.0f, 0.0f, 1.0f);
double lastTime = glfwGetTime();
double nbFrames = 0;
while(!glfwWindowShouldClose(window))
{
double currentTime = glfwGetTime();
nbFrames++;
if (currentTime - lastTime >= 1.0)
{
char cbuffer[50];
snprintf(cbuffer, sizeof(cbuffer), "OpenGL Vsync Test [%.1f fps, %.3f ms]", nbFrames, 1000.0 / nbFrames);
glfwSetWindowTitle(window, cbuffer);
nbFrames = 0;
lastTime++;
}
glClear(GL_COLOR_BUFFER_BIT);
glfwSwapBuffers(window);
glfwPollEvents();
//limit to 100FPS for when vsync is disabled
std::chrono::milliseconds dura(10);
std::this_thread::sleep_for(dura);
}
glfwDestroyWindow(window);
glfwTerminate();
exit(EXIT_SUCCESS);
}
The following opengl code is specifically made to be very GPU-heavy and this forces the CPU to wait for a bit while the GPU finishes its work. In particular, it does so at the glFinish() call where the CPU waits for a measured 99.87% of the time per frame.
In order for Ping (or any Internet application) to work, it involves calls to the software [ and hardware ] Internet stack. When the CPU gets busy it allocates less time and resources to servicing the stack, which pushes up latency.
I hesitate to give this as an answer, as I don't really know the "answer," but hopefully I can shed some light towards it.
I have an nVidia GPU as well and I've noticed the same thing. My guess is that the driver is essentially spin-waiting:
while(NotTimeToSwapYet()){}
(or whatever the fancy driver version of that looks like).
Using process hacker to sample some stack traces from nvoglv32.dll
's thread, The thing that's at the top of the list about 99% of the time is
KeAcquireSpinLockAtDpcLevel()
which is usually downstream from things like
KiCheckForKernelApcDelivery()
and EngUnlockDirectDrawSurface()
I'm not well versed enough in Windows driver programming to call this conclusive, but it certainly doesn't tell me I'm wrong either.
And it doesn't look like you're doing anything obviously wrong either. It's been my experience that swap timing in non-exclusive Windows applications is just really painful: there's a lot of trial and error involved, and a lot of variability between different systems. As far as I know there is no "right" way to do it that will work well all the time (please, someone tell me I'm wrong!).
In the past, I've been able to rely on vsync to keep CPU usage low (even if it did make things a bit less responsive), but that doesn't seem to be the case any longer. I switched from DirectX to OpenGL relatively recently, so I couldn't tell you if this is a recent change in nVidia's driver, or whether they just treat DX and OpenGL differently with respect to vsync.
after swap buffers, call DwmFlush();
and it will no longer use 100% cpu!
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With