Possibly an ANN 101 question regarding minim batch processing. Google didn't seem to have the answer. A search here didn't yield anything either. My guess is there's a book somewhere that says, "do it this way!" and I just haven't read that book.
I'm coding a neural net in Python (not that the language matters). I'm attempting to add mini-batch updates instead of full batch. Is it necessary to select each observation once for each epoch? Mini-batches would be data values 1:10, 11:20, 21:30, etc. so that all observations are used, and they are all used once.
Or is it correct to select the mini batch randomly from the training data set based on a probability? The result being that each observation may be used once, multiple times, or not at all in any given epoch. For 20 mini-batches per epoch, each data element would be given a 5% chance of being selected for any given mini-batch. Mini batches would be randomly selected and random in size but approximately 1 of every 20 data points would be included in each of 20 mini batches with no guarantee of selection.
Some tips regarding mini-batch training:
Shuffle your samples before every epoch
The reason is the same as why you shuffle the samples in online training: Otherwise the network might simply memorize the order in which you feed the samples.
Use a fixed batch size for every batch and for every epoch
There is probably also a statistical reason, but it simplifies the implementation as it enables you to use fast implementations of matrix multiplications for your calculations. (e.g. BLAS)
Adapt your learning rate to the batch size
For larger batches you'll have to use a smaller learning rate, otherwise the ANN tends to converge towards a sub-optimal minimum. I always scaled my learning rates by 1/sqrt(n), where n is the batch size. Please note that this is just an empirical value from experiments.
Your first guess is correct. Just randomize your dataset first. Then for (say) a 20 mini-batch. Use: 1-20, then 21-40, etc... So, all your dataset will be used.
Ben don't say that the data set are only used once. You normally need to do multiple epochs on all the dataset for your network to learn properly.
Mini-batch is primarily use to speed up the learning process.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With