Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I use 100% of VRAM on a secondary GPU from a single process on windows 10?

This is on windows 10 computer with no monitor attached to the Nvidia card. I've included output from nvida-smi showing > 5.04G was available.

Here is the tensorflow code asking it to allocate just slightly more than I had seen previously: (I want this to be as close as possible to memory fraction=1.0)

config = tf.ConfigProto()
#config.gpu_options.allow_growth=True
config.gpu_options.per_process_gpu_memory_fraction=0.84
config.log_device_placement=True
sess = tf.Session(config=config)

Just before running the above line in a jupyter notebook I ran nvida-smi:

    +-----------------------------------------------------------------------------+
| NVIDIA-SMI 376.51                 Driver Version: 376.51                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name            TCC/WDDM | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX 106... WDDM  | 0000:01:00.0     Off |                  N/A |
|  0%   27C    P8     5W / 120W |     43MiB /  6144MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID  Type  Process name                               Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

Output from TF after it successfully allocates 5.01GB, shows "failed to allocate 5.04G (5411658752 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY" (you need to scroll to the right to see it below)

2017-12-17 03:53:13.959871: I C:\tf_jenkins\home\workspace\rel-win\M\windows-gpu\PY\35\tensorflow\core\common_runtime\gpu\gpu_device.cc:1030] Found device 0 with properties:
name: GeForce GTX 1060 6GB major: 6 minor: 1 memoryClockRate(GHz): 1.7845
pciBusID: 0000:01:00.0
totalMemory: 6.00GiB freeMemory: 5.01GiB
2017-12-17 03:53:13.960006: I C:\tf_jenkins\home\workspace\rel-win\M\windows-gpu\PY\35\tensorflow\core\common_runtime\gpu\gpu_device.cc:1120] Creating TensorFlow device (/device:GPU:0) -> (device: 0, name: GeForce GTX 1060 6GB, pci bus id: 0000:01:00.0, compute capability: 6.1)
2017-12-17 03:53:13.961152: E C:\tf_jenkins\home\workspace\rel-win\M\windows-gpu\PY\35\tensorflow\stream_executor\cuda\cuda_driver.cc:936] failed to allocate 5.04G (5411658752 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY
Device mapping:
/job:localhost/replica:0/task:0/device:GPU:0 -> device: 0, name: GeForce GTX 1060 6GB, pci bus id: 0000:01:00.0, compute capability: 6.1
2017-12-17 03:53:14.151073: I C:\tf_jenkins\home\workspace\rel-win\M\windows-gpu\PY\35\tensorflow\core\common_runtime\direct_session.cc:299] Device mapping:
/job:localhost/replica:0/task:0/device:GPU:0 -> device: 0, name: GeForce GTX 1060 6GB, pci bus id: 0000:01:00.0, compute capability: 6.1

My best guess is some policy in an Nvidia user level dll is preventing use of all of the memory (perhaps to allow for attaching a monitor?)

If that theory is correct I'm looking for any user accessible knob to turn that off on windows 10. If I'm on the wrong track any help to point in the right direction is appreciated.

Edit #1:

I realized I did not include this bit of research: The following code in tensorflow indicates stream_exec is 'telling' TensorFlow that only 5.01GB is free. This is the primary reason for my current theory that some Nvidia component is preventing the allocation. (However I could be misunderstanding what component implements the instantiated stream_exec.)

auto stream_exec = executor.ValueOrDie();
int64 free_bytes;
int64 total_bytes;
if (!stream_exec->DeviceMemoryUsage(&free_bytes, &total_bytes)) {
  // Logs internally on failure.
  free_bytes = 0;
  total_bytes = 0;
}
const auto& description = stream_exec->GetDeviceDescription();
int cc_major;
int cc_minor;
if (!description.cuda_compute_capability(&cc_major, &cc_minor)) {
  // Logs internally on failure.
  cc_major = 0;
  cc_minor = 0;
}
LOG(INFO) << "Found device " << i << " with properties: "
          << "\nname: " << description.name() << " major: " << cc_major
          << " minor: " << cc_minor
          << " memoryClockRate(GHz): " << description.clock_rate_ghz()
          << "\npciBusID: " << description.pci_bus_id() << "\ntotalMemory: "
          << strings::HumanReadableNumBytes(total_bytes)
          << " freeMemory: " << strings::HumanReadableNumBytes(free_bytes);
}

Edit #2:

The thread below indicates Windows 10 is preventing full use of VRAM pervasively across secondary video cards used for compute by grabbing a % of the VRAM: https://social.technet.microsoft.com/Forums/windows/en-US/15b9654e-5da7-45b7-93de-e8b63faef064/windows-10-does-not-let-cuda-applications-to-use-all-vram-on-especially-secondary-graphics-cards?forum=win10itprohardware

This thread seems implausible given it would mean all windows 10 boxes are inherently worse than windows 7 for anything where VRAM on compute dedicated graphics cards could plausibly be the bottleneck.

Edit #3:

Update title to more clearly be a question. Feedback indicates this may be better as a bug to Microsoft or Nvidia. I am pursuing other avenues to get this addressed. However I don't want to assume this cannot be resolved directly.
Further experiments do indicate that the issue I am hitting is for the case of a large allocation from a single process. All of the VRAM can be used when another process comes into play.

Edit #4

The failure here is an allocation failure, and according to the NVIDIA-SMI above I have 43MiB in use (perhaps by the system?), but not by an identifiable process. The type of failure I'm seeing is of a monolithic single allocation. Under a typical allocation model that requires a continuous address space. So the pertinent question may be: What is causing that 43MiB to be used? Is that placed in the address space such that the 5.01 GB allocation is the max contiguous space available?

like image 492
Steve Steiner Avatar asked Dec 17 '17 12:12

Steve Steiner


People also ask

Can you Increase GPU VRAM?

There is no way to preset your VRAM to a specific value, you can only limit the maximum memory that it can take. The Graphics Processing Unit (GPU) does not have a dedicated memory; it uses shared memory that will be allocated automatically depending on various factors.

Can you use RAM as VRAM?

Any GPU can use system RAM when running out of its own VRAM. Texture data can be used from system RAM over the PCIe bus to make up for the lack of the faster VRAM.

How do I choose which Graphics card software to use?

Switching to the dedicated Nvidia GPU - Open the tab Program Settings and choose the game from the dropdown menu. - Next, select the preferred graphics processor for this program from the second dropdown. Your Nvidia GPU should show as High performance Nvidia processor. Finally, save your changes.


1 Answers

It is clearly not possible for now, as Windows Display Driver Model 2.x has a limit defined, and no process can override it {Legally}.

Assuming you have played with "Prefer Maximum Performance Setting" with that you can push it to at max 92% with Power Supply.

This would help you in detail, if you like to know more about the WDDM 2.x:

https://docs.microsoft.com/en-us/windows-hardware/drivers/display/what-s-new-for-windows-threshold-display-drivers--wddm-2-0-

like image 124
N.K Avatar answered Nov 16 '22 01:11

N.K