I'm working on this class on convolutional neural networks. I've been trying to implement the gradient of a loss function for an svm and (I have a copy of the solution) I'm having trouble understanding why the solution is correct.
On this page it defines the gradient of the loss function to be as follows: In my code I my analytic gradient matches with the numeric one when implemented in code as follows:
dW = np.zeros(W.shape) # initialize the gradient as zero
# compute the loss and the gradient
num_classes = W.shape[1]
num_train = X.shape[0]
loss = 0.0
for i in xrange(num_train):
scores = X[i].dot(W)
correct_class_score = scores[y[i]]
for j in xrange(num_classes):
if j == y[i]:
if margin > 0:
continue
margin = scores[j] - correct_class_score + 1 # note delta = 1
if margin > 0:
dW[:, y[i]] += -X[i]
dW[:, j] += X[i] # gradient update for incorrect rows
loss += margin
However, it seems like, from the notes, that dW[:, y[i]]
should be changed every time j == y[i]
since we subtract the the loss whenever j == y[i]
. I'm very confused why the code is not:
dW = np.zeros(W.shape) # initialize the gradient as zero
# compute the loss and the gradient
num_classes = W.shape[1]
num_train = X.shape[0]
loss = 0.0
for i in xrange(num_train):
scores = X[i].dot(W)
correct_class_score = scores[y[i]]
for j in xrange(num_classes):
if j == y[i]:
if margin > 0:
dW[:, y[i]] += -X[i]
continue
margin = scores[j] - correct_class_score + 1 # note delta = 1
if margin > 0:
dW[:, j] += X[i] # gradient update for incorrect rows
loss += margin
and the loss would change when j == y[i]
. Why are they both being computed when J != y[i]
?
I don't have enough reputation to comment, so I am answering here. Whenever you compute loss vector for x[i]
, i
th training example and get some nonzero loss, that means you should move your weight vector for the incorrect class (j != y[i])
away by x[i]
, and at the same time, move the weights or hyperplane for the correct class (j==y[i]
) near x[i]
. By parallelogram law, w + x
lies in between w
and x
. So this way w[y[i]]
tries to come nearer to x[i]
each time it finds loss>0
.
Thus, dW[:,y[i]] += -X[i]
and dW[:,j] += X[i]
is done in the loop, but while update, we will do in direction of decreasing gradient, so we are essentially adding X[i]
to correct class weights and going away by X[i]
from weights that miss classify.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With