Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Sending large images time to the GPU

I am using a CNN model (AlexNet) implemented in Lua using Torch for image processing. I am modifying the Torch starter code.

My problem is that I am making images with 18 channels instead of 3 channels for training the model, and for sending those images to GPU, it takes around 20 (2.13 s for every batch) times more than when it sends images with three channels (0.14s for every batch). I also tried to see how long it took to send images with 4 channels to GPU. I saw that as soon as the number of channels increased to more than 3 channels, the time increased about 20 times. For example, even for images with 4 channels, it took around 2s for every batch, which is around 19 times more than running 3 channel images.

I was wondering if there is a bug which makes it take this much time and if there aren't any bugs, if there is any way I can decrease this running time?

like image 947
M.es Avatar asked Jun 15 '17 16:06

M.es


1 Answers

Short Answer

This is a problem that isn't going away. This is a bandwidth issue of CPU to GPU buffering. You've increased the amount of data having to be sent across the bus by a large factor.

Possible Workaround

The essence of what you are trying to do is to include previous frames in your model. If that is what you want to accomplish there is another way of doing that.

If a training batch wasn't a random selection of stacked images, if instead a training batch was regular images but all sequential in time.

In the second case you would send in images with just 3 channels but the images would not be out of order.

Let's explore that what-if.

First, you could still create random sampling by changing the start time and end time for each batch and randomly pick which video to choose from.

Secondly, you could use the [batch,height,weight,channel] tensor to generate on the GPU a new tensor that is

[ batch[1:], height, width, channel] - [ batch[:-1], height, width, channel]
and assign it to diffTensor 

and then concatenate the following tensors

origTensor [ batch[5:-0], height, width, channel] 
diffTensor [ batch[5:-0], height, width, channel] 
diffTensor [ batch[4:-1], height, width, channel] 
diffTensor [ batch[3:-2], height, width, channel] 
diffTensor [ batch[2:-3], height, width, channel] 
diffTensor [ batch[1:-4], height, width, channel] 
diffTensor [ batch[0:-5], height, width, channel] 

If you wanted 5 "look backs"

What would this accomplish? Well, if you sent 100 images to the GPU, then this network would be able to generate 95 image+diff images for the network price of sending only 100 images where as if instead you tried to send 95 image+diff images that had 5 layers each you'd have to pay the network price of sending 500 images. Basicly you can cut your network costs by almost x5

like image 187
Anton Codes Avatar answered Oct 06 '22 00:10

Anton Codes