What is the default batch size of pytorch SGD?

Tags:

What does pytorch SGD do if I feed the whole data and do not specify the batch size? I don't see any "stochastic" or "randomness" in the case. For example, in the following simple code, I feed the whole data (x,y) into a model.

optimizer = torch.optim.SGD(model.parameters(), lr=0.1)  
for epoch in range(5):
    y_pred = model(x_data)
    loss = criterion(y_pred, y_data)
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()

Suppose there are 100 data pairs (x,y), i.e. x_data and y_data each has 100 elements.

Question: It seems to me that all the 100 gradients are calculated before one update of parameters. Size of a "mini_batch" is 100, not 1. So there is no randomness, am I right? At first, I think SGD means randomly choose 1 data point and calculate its gradient, which will be used as an approximation of the true gradient from all data.

934

asked Feb 05 '20 02:02

Tony B

1 Answers

The SGD optimizer in PyTorch is just gradient descent. The stocastic part comes from how you usually pass a random subset of your data through the network at a time (i.e. a mini-batch or batch). The code you posted passes the entire dataset through on each epoch before doing backprop and stepping the optimizer so you're really just doing regular gradient descent.

169

answered Sep 19 '22 11:09

jodag

Related questions
                            
                                sklearn - model keeps overfitting
                            
                                How does having smaller values for parameters help in preventing over-fitting?
                            
                                OpenCL Theano - How to forcefully disable CUDA?
                            
                                Filtering and displaying values in GraphLab Sframe?
                            
                                How to plot ROC curve and precision-recall curve from BinaryClassificationMetrics
                            
                                Why is `sklearn.manifold.MDS` random when `skbio's pcoa` is not?
                            
                                Syntactic similarity/distance between 2 sentences/string/text using nltk [duplicate]
                            
                                Choose the best cluster partition based on a cost function
                            
                                deep neural network's precision for image recognition, float or double?
                            
                                L2 normalised output with keras
                            
                                Most important features in MLPClassifier in Sklearn
                            
                                When to use supervised or unsupervised learning?
                            
                                Getting correct shape for datapoint to predict with a Regression model after using One-Hot-Encoding in training
                            
                                How a Convolutional Neural Net handles channels
                            
                                How to map categorical data to category_encoders.OrdinalEncoder in python pandas dataframe
                            
                                Finding contours of a two-part letter
                            
                                Normalizing data with binary and continuous variables for machine learning
                            
                                Measure similarity between two documents using Doc2Vec
                            
                                My LSTM learns, loss decreases, but Numerical Gradients don't match Analytical Gradients
                            
                                how to run a pre-trained model in AWS sagemaker?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

What is the default batch size of pytorch SGD?

Tags:

machine-learning

deep-learning

pytorch

gradient-descent

stochastic-gradient

Tony B

People also ask

1 Answers

jodag

Recent Activity

Donate For Us