Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How are multiple gpus utilized in Caffe?

Tags:

caffe

I want to know how Caffe utilizes multiple GPUs so that I can decide to upgrade to a new more powerful card or just buy the same card and run on SLI.
for example am I better off buying one TitanX 12 GB , or two GTX 1080 8 GB ?
If I go SLI the 1080s, will my effective memory get doubled? I mean can I run a network which takes 12 or more GB of vram using them? Or am I left with only 8 GB ? Again how is memory utilized in such scenarios ? What would happen if two different cards are installed (both NVIDIA) ? Does caffe utilize the memory available the same? (suppose one 980 and one 970!)

like image 765
Hossein Avatar asked Dec 21 '16 16:12

Hossein


3 Answers

for example am I better off buying one TitanX 12 GB , or two GTX 1080 8 GB ? If I go SLI the 1080s, will my effective memory get doubled? I mean can I run a network which takes 12 or more GB of vram using them? Or am I left with only 8 GB ?

No, effective memory size in case of 2 GPU with 8Gb of RAM will be 8Gb, but effective batch size will be doubled which will lead to more stable\fast training.

What would happen if two different cards are installed (both NVIDIA) ? Does caffe utilize the memory available the same? (suppose one 980 and one 970!)

I think you will be limited to lower card and may have problem with drivers, so I don't recomend to try this configuration. Also from documentation:

Current implementation has a "soft" assumption that the devices being used are homogeneous. In practice, any devices of the same general class should work together, but performance and total size is limited by the smallest device being used. e.g. if you combine a TitanX and a GTX980, performance will be limited by the 980. Mixing vastly different levels of boards, e.g. Kepler and Fermi, is not supported.

Summing up: with GPU that have lots of RAM you can train deeper models, with multiply GPUs you can train single model faster and also you can train separate models per GPU. I would choose single GPU with more memory (TitanX) because deep networks nowadays are more RAM bounded(e.g. ResNet-152 or some semantic segmentation network) and more memory will give the opportunity to run deeper networks and with larger batch size, otherwise if you have some task that fit on single GPU (GTX 1080) you can buy 2 or 4 of them just to make things faster.

Also here is some info about multi GPU support in Caffe:

The current implementation uses a tree reduction strategy. e.g. if there are 4 GPUs in the system, 0:1, 2:3 will exchange gradients, then 0:2 (top of the tree) will exchange gradients, 0 will calculate updated model, 0->2, and then 0->1, 2->3.

https://github.com/BVLC/caffe/blob/master/docs/multigpu.md

like image 130
mrgloom Avatar answered Sep 23 '22 04:09

mrgloom


I don't believe Caffe supports SLI mode. The two GPUs are treated as separate cards.

When you run Caffe and add the '-gpu' flag (assuming you are using the command line), you can specify which GPU to use (-gpu 0 or -gpu 1 for example). You can also specify multiple GPUs (-gpu 0,1,3) including using all GPUs (-gpu all).

When you execute using multiple GPUs, Caffe will execute the training across all of the GPUs and then merge the training updates across the models. This is effectively doubling (or more if you have more than 2 GPUs) the batch size for each iteration.

In my case, I started with a NVIDIA GTX 970 (4GB card) and then upgraded to a NVIDIA GTX Titan X (Maxwell version with 12 GB) because my models were too large to fit in the GTX 970. I can run some of the smaller models across both cards (even though they are not the same) as long as the model will fully fit into the 4GB of the smaller card. Using the standard ImageNet model, I could execute across both cards and cut my training time in half.

If I recall correctly, other frameworks (TensorFlow and maybe the Microsoft CNTK) support splitting a model among different nodes to effectively increase the available GPU memory like what you are describing. Although I haven't personally tried either one, I understand you can define on a per-layer basis where the layer executes.

Patrick

Link

like image 38
Hossein Avatar answered Sep 25 '22 04:09

Hossein


Perhaps a late answer, but caffe supports gpu parallelism, which means you can indeed fully utilize both gpu's, but I do recommend getting two gpu's of equal memory size, since I don't think caffe lets you select the batch size per gpu.

As for how memory is utilized, using multiple gpu's each gpu gets a batch of batch size as specified in your train_val.prototxt, so if your batch size is for example 16 and you're using 2 gpu's, you'd have an effective batch size 32.

Finally, I know that for things such as gaming, SLI seems to be much less effective and often much more problematic than having a single, more powerful GPU. So if you are planning on using the GPU's for more than only Deep Learning, I'd recommend you still go for the Titan X

like image 32
Teo Cherici Avatar answered Sep 22 '22 04:09

Teo Cherici