CS231n: How to calculate gradient for Softmax loss function?

Tags:

I am watching some videos for Stanford CS231: Convolutional Neural Networks for Visual Recognition but do not quite understand how to calculate analytical gradient for softmax loss function using numpy.

From this stackexchange answer, softmax gradient is calculated as:

derivative calculation

Python implementation for above is:

num_classes = W.shape[0]
num_train = X.shape[1]
for i in range(num_train):
  for j in range(num_classes):
    p = np.exp(f_i[j])/sum_i
    dW[j, :] += (p-(j == y[i])) * X[:, i]

Could anyone explain how the above snippet work? Detailed implementation for softmax is also included below.

def softmax_loss_naive(W, X, y, reg):
  """
  Softmax loss function, naive implementation (with loops)
  Inputs:
  - W: C x D array of weights
  - X: D x N array of data. Data are D-dimensional columns
  - y: 1-dimensional array of length N with labels 0...K-1, for K classes
  - reg: (float) regularization strength
  Returns:
  a tuple of:
  - loss as single float
  - gradient with respect to weights W, an array of same size as W
  """
  # Initialize the loss and gradient to zero.
  loss = 0.0
  dW = np.zeros_like(W)

  #############################################################################
  # Compute the softmax loss and its gradient using explicit loops.           #
  # Store the loss in loss and the gradient in dW. If you are not careful     #
  # here, it is easy to run into numeric instability. Don't forget the        #
  # regularization!                                                           #
  #############################################################################

  # Get shapes
  num_classes = W.shape[0]
  num_train = X.shape[1]

  for i in range(num_train):
    # Compute vector of scores
    f_i = W.dot(X[:, i]) # in R^{num_classes}

    # Normalization trick to avoid numerical instability, per http://cs231n.github.io/linear-classify/#softmax
    log_c = np.max(f_i)
    f_i -= log_c

    # Compute loss (and add to it, divided later)
    # L_i = - f(x_i)_{y_i} + log \sum_j e^{f(x_i)_j}
    sum_i = 0.0
    for f_i_j in f_i:
      sum_i += np.exp(f_i_j)
    loss += -f_i[y[i]] + np.log(sum_i)

    # Compute gradient
    # dw_j = 1/num_train * \sum_i[x_i * (p(y_i = j)-Ind{y_i = j} )]
    # Here we are computing the contribution to the inner sum for a given i.
    for j in range(num_classes):
      p = np.exp(f_i[j])/sum_i
      dW[j, :] += (p-(j == y[i])) * X[:, i]

  # Compute average
  loss /= num_train
  dW /= num_train

  # Regularization
  loss += 0.5 * reg * np.sum(W * W)
  dW += reg*W

  return loss, dW

527

asked Jan 15 '17 17:01

Nghia Tran

2 Answers

Not sure if this helps, but:

$y_i$ is really the indicator function $Ind\{y_i=j\}$ , as described here. This forms the expression (j == y[i]) in the code.

Also, the gradient of the loss with respect to the weights is:

$\frac{dL}{dW} = \frac{dL}{df} \frac{df}{dW$

where

$\frac{df}{dW} = X_i$

which is the origin of the X[:,i] in the code.

185

answered Oct 19 '22 10:10

Ben Barsdell

I know this is late but here's my answer:

I'm assuming you are familiar with the cs231n Softmax loss function. We know that: enter image description here

So just as we did with the SVM loss function the gradients are as follows: enter image description here

Hope that helped.

answered Oct 19 '22 09:10

Jawher.B

Related questions
                            
                                Creating a new column in Panda by using lambda function on two existing columns
                            
                                Tensorflow mean squared error loss function
                            
                                How to change spacing between ticks in matplotlib?
                            
                                Are numpy's basic operations vectorized, i.e. do they use SIMD operations?
                            
                                How to specify a configuration file for pylint under windows?
                            
                                how to test for a regex match
                            
                                What is python's equivalent of R's NA?
                            
                                What are the differences between mysql-connector-python, mysql-connector-python-rf and mysql-connector-repackaged?
                            
                                Why is a False value (0) smaller in bytes than True (1)?
                            
                                What is the difference between jedi and python language server in VS code IDE?
                            
                                How to override the default value of a Model Field from an Abstract Base Class
                            
                                How do I wrap a C++ class with Cython?
                            
                                Regular Expressions in Python unexpectedly slow
                            
                                Python CLI program unit testing
                            
                                How to get PyCharm to check PEP8 code style?
                            
                                python asyncio, how to create and cancel tasks from another thread
                            
                                Keras Maxpooling2d layer gives ValueError
                            
                                How to install Tensorflow on Python 2.7 on Windows?
                            
                                When to use utf8 as a header in py files
                            
                                What does the landing time mean in airflow?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

CS231n: How to calculate gradient for Softmax loss function?

Tags:

python

numpy

softmax

Nghia Tran

People also ask

2 Answers

Ben Barsdell

Jawher.B

Recent Activity

Donate For Us