Hi I'm trying to implement Gradient Descent algorithm for a function:
My starting point for the algorithm is w = (u,v) = (2,2). The learning rate is eta = 0.01 and bound = 10^-14. Here is my MATLAB code:
function [resultTable, boundIter] = gradientDescent(w, iters, bound, eta)
% FUNCTION [resultTable, boundIter] = gradientDescent(w, its, bound, eta)
%
% DESCRIPTION:
% - This function will do gradient descent error minimization for the
% function E(u,v) = (u*exp(v) - 2*v*exp(-u))^2.
%
% INPUTS:
% 'w' a 1-by-2 vector indicating initial weights w = [u,v]
% 'its' a positive integer indicating the number of gradient descent
% iterations
% 'bound' a real number indicating an error lower bound
% 'eta' a positive real number indicating the learning rate of GD algorithm
%
% OUTPUTS:
% 'resultTable' a iters+1-by-6 table indicating the error, partial
% derivatives and weights for each GD iteration
% 'boundIter' a positive integer specifying the GD iteration when the error
% function got below the given error bound 'bound'
%
% The error function
E = @(u,v) (u*exp(v) - 2*v*exp(-u))^2;
% Partial derivative of E with respect to u
pEpu = @(u,v) 2*(u*exp(v) - 2*v*exp(-u))*(exp(v) + 2*v*exp(-u));
% Partial derivative of E with respect to v
pEpv = @(u,v) 2*(u*exp(v) - 2*v*exp(-u))*(u*exp(v) - 2*exp(-u));
% Initialize boundIter
boundIter = 0;
% Create a table for holding the results
resultTable = zeros(iters+1, 6);
% Iteration number
resultTable(1, 1) = 0;
% Error at iteration i
resultTable(1, 2) = E(w(1), w(2));
% The value of pEpu at initial w = (u,v)
resultTable(1, 3) = pEpu(w(1), w(2));
% The value of pEpv at initial w = (u,v)
resultTable(1, 4) = pEpv(w(1), w(2));
% Initial u
resultTable(1, 5) = w(1);
% Initial v
resultTable(1, 6) = w(2);
% Loop all the iterations
for i = 2:iters+1
% Save the iteration number
resultTable(i, 1) = i-1;
% Update the weights
temp1 = w(1) - eta*(pEpu(w(1), w(2)));
temp2 = w(2) - eta*(pEpv(w(1), w(2)));
w(1) = temp1;
w(2) = temp2;
% Evaluate the error function at new weights
resultTable(i, 2) = E(w(1), w(2));
% Evaluate pEpu at the new point
resultTable(i, 3) = pEpu(w(1), w(2));
% Evaluate pEpv at the new point
resultTable(i, 4) = pEpv(w(1), w(2));
% Save the new weights
resultTable(i, 5) = w(1);
resultTable(i, 6) = w(2);
% If the error function is below a specified bound save this iteration
% index
if E(w(1), w(2)) < bound
boundIter = i-1;
end
end
This is an exercise in my machine learning course, but for some reason my results are all wrong. There must be something wrong in the code. I have tried debugging and debugging it and haven't found anything wrong...can someone identify what is my problem here?...In other words can you check that the code is valid gradient descent algorithm for the given function?
Please let me know if my question is too unclear or if you need more info :)
Thank you for your effort and help! =)
Here is my results for five iterations and what other people got:
PARAMETERS: w = [2,2], eta = 0.01, bound = 10^-14, iters = 5
As discussed below the question: I would say the others are wrong... your minimization leads to smaller values of E(u,v)
, check:
E(1.4,1.6) = 37.8 >> 3.6 = E(0.63, -1.67)
Not a complete answer but lets go for it:
I added a plotting part in your code, so you can see whats going on.
u1=resultTable(:,5);
v1=resultTable(:,6);
E1=E(u1,v1);
E1(E1<bound)=NaN;
[x,y]=meshgrid(-1:0.1:5,-5:0.1:2);Z=E(x,y);
surf(x,y,Z)
hold on
plot3(u1,v1,E1,'r')
plot3(u1,v1,E1,'r*')
The result shows that your algorithm is doing the right thing for that function. So, as other said, or all the others are wrong, or you are not using the right equation from the beggining.
(I apologize for not just commenting, but I'm new to SO and cannot comment.)
It appears that your algorithm is doing the right thing. What you want to be sure is that at each step the energy is shrinking (which it is). There are several reasons why your data points may not agree with the others in the class: they could be wrong (you or others in the class), they perhaps started at a different point, they perhaps used a different step size (what you are calling eta I believe).
Ideally, you don't want to hard-code the number of iterations. You want to continue until you reach a local minimum (which hopefully is the global minimum). To check this, you want both partial derivatives to be zero (or very close). In addition, to make sure you're at a local min (not a local max, or saddle point) you should check the sign of E_uu*E_vv - E_uv^2 and the sign of E_uu look at: http://en.wikipedia.org/wiki/Second_partial_derivative_test for details (the second derivative test, at the top). If you find yourself at a local max or saddle point, your gradient will tell you not to move (since the partial derivatives are 0). Since you know this isn't optimal, you have to just perturb your solution (sometimes called simulated annealing).
Hope this helps.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With