Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is the default batch size of pytorch SGD?

What does pytorch SGD do if I feed the whole data and do not specify the batch size? I don't see any "stochastic" or "randomness" in the case. For example, in the following simple code, I feed the whole data (x,y) into a model.

optimizer = torch.optim.SGD(model.parameters(), lr=0.1)  
for epoch in range(5):
    y_pred = model(x_data)
    loss = criterion(y_pred, y_data)
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()

Suppose there are 100 data pairs (x,y), i.e. x_data and y_data each has 100 elements.

Question: It seems to me that all the 100 gradients are calculated before one update of parameters. Size of a "mini_batch" is 100, not 1. So there is no randomness, am I right? At first, I think SGD means randomly choose 1 data point and calculate its gradient, which will be used as an approximation of the true gradient from all data.

like image 934
Tony B Avatar asked Feb 05 '20 02:02

Tony B


People also ask

What is batch size in SGD?

batch_size is the size of how large each update will be. Here, batch_size=1 means the size of each update is 1 sample. By your definitions, this would be SGD. If you have batch_size=len(train_data) , that means that each update to your weights will require the resulting gradient from your entire dataset.

What is PyTorch batch size?

PyTorch dataloader batch size Batch size is defined as the number of samples processed before the model is updated. The batch size is equal to the number of samples in the training data.

How does batch size affect SGD?

The size of mini-batches is essentially the frequency of updates: the smaller minibatches the more updates. At one extreme (minibatch=dataset) you have gradient descent. At the other extreme (minibatch=one line) you have full per line SGD.

How does SGD work in PyTorch?

In PyTorch, there are multiple capabilities with respect to the SGD optimizer. Setting the momentum parameter to 0 gives you standard SGD. If momentum > 0 , then you use momentum without the lookahead i.e., Classical Momentum.


1 Answers

The SGD optimizer in PyTorch is just gradient descent. The stocastic part comes from how you usually pass a random subset of your data through the network at a time (i.e. a mini-batch or batch). The code you posted passes the entire dataset through on each epoch before doing backprop and stepping the optimizer so you're really just doing regular gradient descent.

like image 169
jodag Avatar answered Sep 19 '22 11:09

jodag