When running a TensorFlow job I sometimes get a non-fatal error that says GPU memory exceeded, and then I see the "Shared memory GPU usage" go up on the Performance Monitor on Windows 10.
How does TensorFlow achieve this? I have looked at CUDA documentation and not found a reference to the Dedicated and Shared concepts used in the Performance Monitor. There is a Shared Memory concept in CUDA but I think it is something on the device, not the RAM I see in the Performance Monitor, which is allocated by the BIOS from CPU RAM.
Note: A similar question was asked but not answered by another poster.
Shared memory in windows 10 does not refer to the same concept as cuda shared memory (or local memory in opencl), it refers to host accessible/allocated memory from the GPU. For integrated graphics processing host and device memory is usually the same as shared thanks to both the cpu and gpu being located on the same die and being able to access the same ram. For dedicated graphics with their own memory, this is separate memory allocated on the host side for use by the GPU.
Shared memory for compute APIs such as through GLSL compute shaders, or Nvidia CUDA kernels refer to a programmer managed cache layer (some times refereed to as "scratch pad memory") which on Nvidia devices, exists per SM, and can only be accessed by a single SM and is usually between 32kB to 96kB per SM. Its purpose is to speed up memory access to data which is used often.
If you see and increase shared memory used in Tensorflow, you have a dedicated graphics card, and you are experiencing "GPU memory exceeded" it most likely means you are using too much memory on the GPU itself, so it is trying to allocate memory from elsewhere (IE from system RAM). This potentially can make your program much slower as the bandwidth and latency will be much worse on non device memory for a dedicated graphics card.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With