Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Stochastic Gradient Descent(SGD) vs Mini-batch size 1

Is stochastic gradient descent basically the name given to mini-batch training where batch size = 1 and selecting random training rows? i.e. it is the same as 'normal' gradient descent, it's just the manner in which the training data is supplied that makes the difference?

One thing that confuses me is I've seen people say that even with SGD you can supply more than 1 data point, and have larger batches, so won't that just make it 'normal' mini-batch gradient descent?

like image 937
BigBadMe Avatar asked Oct 14 '25 03:10

BigBadMe


1 Answers

On Optimization Terminology

Optimization algorithms that use only a single example at a time are sometimes called stochastic, as you mentioned. Optimization algorithms that use the entire training set are called batch or deterministic gradient methods.

Most algorithms used for deep learning fall somewhere in between, using more than one but fewer than all the training examples. These were traditionally called minibatch or minibatch stochastic methods, and it is now common to call them simply stochastic methods.

Hope that makes the terminology clearer:

Deeplearningbook by Goodfellow p.275-276

like image 168
mrk Avatar answered Oct 19 '25 13:10

mrk



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!