Gradient Descent implementation in octave

Tags:

octave

I've actually been struggling against this for like 2 months now. What is it that makes these different?

hypotheses= X * theta
temp=(hypotheses-y)'
temp=X(:,1) * temp
temp=temp * (1 / m)
temp=temp * alpha
theta(1)=theta(1)-temp

hypotheses= X * theta
temp=(hypotheses-y)'
temp=temp * (1 / m)
temp=temp * alpha
theta(2)=theta(2)-temp



theta(1) = theta(1) - alpha * (1/m) * ((X * theta) - y)' * X(:, 1);
theta(2) = theta(2) - alpha * (1/m) * ((X * theta) - y)' * X(:, 2);

The latter works. I'm just not sure why..I struggle to understand the need for the matrix inverse .

400

asked May 14 '12 21:05

3 Answers

What you're doing in the first example in the second block you've missed out a step haven't you? I am assuming you concatenated X with a vector of ones.

   temp=X(:,2) * temp

The last example will work but can be vectorized even more to be more simple and efficient.

I've assumed you only have 1 feature. it will work the same with multiple features since all that happens is you add an extra column to your X matrix for each feature. Basically you add a vector of ones to x to vectorize the intercept.

You can update a 2x1 matrix of thetas in one line of code. With x concatenate a vector of ones making it a nx2 matrix then you can calculate h(x) by multiplying by the theta vector (2x1), this is (X * theta) bit.

The second part of the vectorization is to transpose (X * theta) - y) which gives you a 1*n matrix which when multiplied by X (an n*2 matrix) will basically aggregate both (h(x)-y)x0 and (h(x)-y)x1. By definition both thetas are done at the same time. This results in a 1*2 matrix of my new theta's which I just transpose again to flip around the vector to be the same dimensions as the theta vector. I can then do a simple scalar multiplication by alpha and vector subtraction with theta.

X = data(:, 1); y = data(:, 2); m = length(y); X = [ones(m, 1), data(:,1)];  theta = zeros(2, 1);          iterations = 2000; alpha = 0.001;  for iter = 1:iterations      theta = theta -((1/m) * ((X * theta) - y)' * X)' * alpha; end

answered Oct 09 '22 02:10

Shaun Ryan

In the first one, if X were a 3x2 matrix and theta were a 2x1 matrix, then "hypotheses" would be a 3x1 matrix.

Assuming y is a 3x1 matrix, then you can perform (hypotheses - y) and get a 3x1 matrix, then the transpose of that 3x1 is a 1x3 matrix assigned to temp.

Then the 1x3 matrix is set to theta(2), but this should not be a matrix.

The last two lines of your code works because, using my mxn examples above,

(X * theta)

would be a 3x1 matrix.

Then that 3x1 matrix is subtracted by y (a 3x1 matrix) and the result is a 3x1 matrix.

(X * theta) - y

So the transpose of the 3x1 matrix is a 1x3 matrix.

((X * theta) - y)'

Finally, a 1x3 matrix times a 3x1 matrix will equal a scalar or 1x1 matrix, which is what you are looking for. I'm sure you knew already, but just to be thorough, the X(:,2) is the second column of the 3x2 matrix, making it a 3x1 matrix.

answered Oct 09 '22 01:10

Justin Nafe

function [theta, J_history] = gradientDescent(X, y, theta, alpha, num_iters)
% Performs gradient descent to learn theta. Updates theta by taking num_iters 
% gradient steps with learning rate alpha.

% Number of training examples
m = length(y); 
% Save the cost J in every iteration in order to plot J vs. num_iters and check for convergence 
J_history = zeros(num_iters, 1);

for iter = 1:num_iters
    h = X * theta;
    stderr = h - y;
    theta = theta - (alpha/m) * (stderr' * X)';
    J_history(iter) = computeCost(X, y, theta);
end

end

answered Oct 09 '22 00:10

skeller88

Related questions
                            
                                How can I access a matlab/octave module from python?
                            
                                How do I install a MATLAB mode in GNU Emacs?
                            
                                "Undefined function 'function_name' for input arguments of type 'double'."
                            
                                Replace all zeros in vector by previous non-zero value
                            
                                Octave position of maximum value in column
                            
                                gradient descent seems to fail
                            
                                Plotting with Octave after most recent MAC OSX update
                            
                                Does Matlab accept non-integer indices?
                            
                                Can't find x11 terminal in gnuplot Octave on Mac OS
                            
                                How can I get a list of all the defined variables in Matlab or Octave?
                            
                                fminunc alternate in numpy
                            
                                Submitting Assignment on Coursera ML in Octave
                            
                                How can I start octave from the command line and not the GUI?
                            
                                Selecting only a specific number of rows fulfilling a condition
                            
                                How can I append to a vector in Octave?
                            
                                suppressing printing every assignment
                            
                                How can I flush the output of disp in Matlab or Octave?
                            
                                How to clear the command line in Octave?
                            
                                Why is Octave slower than MATLAB?
                            
                                How can I sort a 2-D array in MATLAB with respect to one column?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With