Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Implementation of Data Augmentation for Image Classification with Convolutional Neural Networks

I'm doing image classification with cudaconvnet with Daniel Nouri's noccn module, and want to implement data augmentation by taking lots of patches of an original image (and flipping it). When would it be best for this to take place?

I've identified 3 stages in the training process when it could:
a) when creating batches from the data
b) when getting the next batch to train
c) given a batch, when getting the next image to feed into the net

It seems to me advantage of a) is that I can scatter the augmented data across all batches. But it will take up 1000x more space on disk The original dataset is already 1TB, so completely infeasible.

b) and c) don't involve storing the new data on disk, but could I scatter the data across batches? If I don't, then supposing I have batch_size==128 and I can augment my data 1000x, then the next 8 batches will all contain images from the same class. Isn't that bad for training the net because each training sample won't be randomised at all?

Furthermore, if I pick b) or c) and create a new batch from k training examples, then data augmentation by n times will make the batchsize n*k instead of giving me n times more batches.

For example, in my case I have batchsize==128 and can expect 1000x data augmentation. So each batch will actually be of size 128*1000 and all I'll get is more accurate partial derivative estimates (and that to a useless extent because batchsize==128k is pointlessly high).

So what should I do?

like image 399
Alexandre Holden Daly Avatar asked Feb 26 '14 18:02

Alexandre Holden Daly


1 Answers

Right, you'd want to have augmented samples as randomly interspersed throughout the rest of the data as possible. Otherwise, you'll definitely run into problems as you've mentioned because the batches won't be properly sampled and your gradient descent steps will be too biased. I am not too familiar with cudaconvnet, as I primarily work with Torch instead, but I do often run into the same situation as you with artificially augmented data.

Your best bet would be (c), kind of.

For me, the best place to augment the data is right when a sample gets loaded by your trainer's inner loop -- apply the random distortion, flip, crop (or however else you're augmenting your samples) right at that moment and to that single data sample. What this will accomplish is that every time the trainer tries to load a sample, it will actually receive a modified version which will probably be different from any other image it has seen at a previous iteration.

Then, of course, you will need to adjust something else to still get the 1000x data size factor in. Either:

  1. Ideally, load more batches per epoch after the inner loop has finished processing the first set. If you have your augmenter set up right, every batch will continue getting random samples so it will all work out well. Torch allows doing this, but it's somewhat tricky and I'm not sure if you'd be able to do the same in cudaconvnet.
  2. Otherwise, simply run the trainer for 1000 more training epochs. Not as elegant, but the end result will be the same. If you later need to report on the number of epochs you actully trained for, simply divide the real count back by 1000 to get a more appropriate estimate based on your 1000x augmented dataset.

This way, you'll always have your target classes as randomly distributed throughout your dataset as your original data was, without consuming any extra diskspace to cache your augmented samples. This is, of course, at the cost of additional computing power, since you'd be generating the samples on demand at every step along the way, but you already know that...

Additionally, and perhaps more importantly, your batches will stay at your original 128 size, so the mini-batching process will remain untouched and your learned parameter updates will continue to drop in at the same frequency you'd expect otherwise. This same process would work great also for SGD training (batch size = 1), since the trainer will never see the "same" image twice.

Hope that helps.

like image 188
monoeci Avatar answered Sep 18 '22 08:09

monoeci