Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Gradient Descent in Matlab

I am taking machine learning class in courseera. The machine learning is a pretty area for me. In first programming exercise I am having some difficulties in gradient decent algorithm. If anyone can help me I will be appreciate.

Here is the instructions for updating thetas;

"You will implement gradient descent in the file gradientDescent.m. The loop structure has been written for you, and you only need to supply the updates to θ within each iteration.

    function [theta, J_history] = gradientDescent(X, y, theta, alpha, num_iters)
    %GRADIENTDESCENT Performs gradient descent to learn theta
    %   theta = GRADIENTDESENT(X, y, theta, alpha, num_iters) updates theta by 
    %   taking num_iters gradient steps with learning rate alpha

   % Initialize some useful values
   m = length(y); % number of training examples
   J_history = zeros(num_iters, 1);

   for iter = 1:num_iters

% ====================== YOUR CODE HERE ======================
% Instructions: Perform a single gradient step on the parameter vector
%               theta. 
%
% Hint: While debugging, it can be useful to print out the values
%       of the cost function (computeCost) and gradient here.
%
    % ============================================================

% Save the cost J in every iteration    
J_history(iter) = computeCost(X, y, theta);

end

end

So here is what I did to update thetas simultaneously;

    temp0 = theta(1,1) - (alpha/m)*sum((X*theta-y));
    temp1 = theta(2,1) - (alpha/m)*sum((X*theta-y).*X);
    theta(1,1) = temp0;
    theta(2,1) = temp1;

I am getting error when I run this code. Can anyone help me please?

like image 606
Ram Avatar asked Jun 01 '14 21:06

Ram


1 Answers

I have explained why you can use the vectorized form:

theta = theta - (alpha/m) * (X' * (X * theta - y)); or the equivalent

theta = theta - (alpha/m) * ((X * theta - y)' * X)';

in this answer.

Quoting it below:


Explanation for the matrix version of gradient descent algorithm:

This is the gradient descent algorithm to fine tune the value of θ: enter image description here

Assume that the following values of X, y and θ are given:

  • m = number of training examples
  • n = number of features + 1

enter image description here

Here

  • m = 5 (training examples)
  • n = 4 (features+1)
  • X = m x n matrix
  • y = m x 1 vector matrix
  • θ = n x 1 vector matrix
  • xi is the ith training example
  • xj is the jth feature in a given training example

Further,

  • h(x) = ([X] * [θ]) (m x 1 matrix of predicted values for our training set)
  • h(x)-y = ([X] * [θ] - [y]) (m x 1 matrix of Errors in our predictions)

whole objective of machine learning is to minimize Errors in predictions. Based on the above corollary, our Errors matrix is m x 1 vector matrix as follows:

enter image description here

To calculate new value of θj, we have to get a summation of all errors (m rows) multiplied by jth feature value of the training set X. That is, take all the values in E, individually multiply them with jth feature of the corresponding training example, and add them all together. This will help us in getting the new (and hopefully better) value of θj. Repeat this process for all j or the number of features. In matrix form, this can be written as:

enter image description here

This can be simplified as: enter image description here

  • [E]' x [X] will give us a row vector matrix, since E' is 1 x m matrix and X is m x n matrix. But we are interested in getting a column matrix, hence we transpose the resultant matrix.

More succinctly, it can be written as: enter image description here

The same result can also be written as: enter image description here

like image 162
jerrymouse Avatar answered Oct 04 '22 01:10

jerrymouse