I have an NVIDIA RTX 2070 GPU and CUDA installed, I have WebGL support, but when I run the various TFJS examples, such as the Addition RNN Example or the Visualizing Training Example, I see my CPU usage go to 100% but the GPU (as metered via nvidia-smi
) never gets used.
How can I troubleshoot this? I don't see any console messages about not finding the GPU. The TFJS docs are really vague about this, only saying that it uses the GPU if WebGL is supported and otherwise falls back to CPU if it can't find the WebGL. But again, WebGL is working. So...how to help it find my GPU?
Other related SO questions seem to be about tfjs-node-gpu, e.g., getting one's own tfjs-node-gpu installation working. This is not about that. I'm talking about running the main TFJS examples on the official TFJS pages from my browser.
Browser is the latest Chrome for Linux. Running Ubuntu 18.04.
EDIT: Since someone will ask, chrome://gpu
shows that hardware acceleration is enabled. The output log is rather long, but here's the top:
Graphics Feature Status
Canvas: Hardware accelerated
Flash: Hardware accelerated
Flash Stage3D: Hardware accelerated
Flash Stage3D Baseline profile: Hardware accelerated
Compositing: Hardware accelerated
Multiple Raster Threads: Enabled
Out-of-process Rasterization: Disabled
OpenGL: Enabled
Hardware Protected Video Decode: Unavailable
Rasterization: Software only. Hardware acceleration disabled
Skia Renderer: Enabled
Video Decode: Unavailable
Vulkan: Disabled
WebGL: Hardware accelerated
WebGL2: Hardware accelerated
TensorFlow supports running computations on a variety of types of devices, including CPU and GPU.
This is most likely because the CUDA and CuDNN drivers are not being correctly detected in your system. In both cases, Tensorflow is not detecting your Nvidia GPU. This can be for a variety of reasons: Nvidia Driver not installed.
The main difference between this, and what we did in Lesson 1, is that you need the GPU enabled version of TensorFlow for your system. However, before you install TensorFlow into this environment, you need to setup your computer to be GPU enabled with CUDA and CuDNN.
Got it essentially solved. I found this older post, that one needs to check whether WebGL is using the "real" GPU or just some Intel-integrated-graphics offshoot of the CPU.
To do this, go to https://alteredqualia.com/tmp/webgl-maxparams-test/ and scroll down to the very bottom and look at the Unmasked Renderer
and Unmasked Vendor
tag.
In my case, these were showing Intel, not my NVIDIA GPU.
My System76 laptop has the capacity to run in "Hybrid Graphics" mode in which big computations are performed on the GPU but smaller things like GUI elements run on the integrated graphics. (This saves battery life.) But while some applications are able to take advantage of the GPU when in Hybrid Graphics mode -- I just ran a great Adversarial Latent AutoEncoder demo that maxed out my GPU while in Hybrid Graphics mode -- not all are. Chrome is one example of the latter, apparently.
To get WebGL to see my NVIDIA GPU, I needed to reboot my system in "full NVIDIA Graphics" mode.
After this reboot, some of the TFJS examples will use the GPU, such as the Visualizing Training example, which now trains almost instantly instead of taking a few minutes to train. But the Addition RNN example still only uses the CPU. This may be because of a missing backend declaration that @edkeveked pointed out.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With