Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Implementing gradient descent for multiple variables in Octave using "sum"

I'm doing Andrew Ng's course on Machine Learning and I'm trying to wrap my head around the vectorised implementation of gradient descent for multiple variables which is an optional exercise in the course.

This is the algorithm in question (taken from here):

enter image description here

I just cannot do this in octave using sum though, I'm not sure how to multiply the sum of the hypothesis of x(i) - y(i) by the all variables xj(i). I tried different iterations of the following code but to no avail (either the dimensions are not right or the answer is wrong):

theta = theta - alpha/m * sum(X * theta - y) * X;

The correct answer, however, is entirely non-obvious (to a linear algebra beginner like me anyway, from here):

theta = theta - (alpha/m *  (X * theta-y)' * X)';

Is there a rule of thumb for cases where sum is involved that governs transformations like the above?

And if so, is there the opposite version of the above (i.e. going from a sum based solution to a general multiplication one) as I was able to come up with a correct implementation using sum for gradient descent for a single variable (albeit not a very elegant one):

temp0 = theta(1) - (alpha/m * sum(X * theta - y));
temp1 = theta(2) - (alpha/m * sum((X * theta - y)' * X(:, 2)));

theta(1) = temp0;
theta(2) = temp1;

Please note that this only concerns vectorised implementations and although there are several questions on SO as to how this is done, my question is primarily concerned with the implementation of the algorithm in Octave using sum.

like image 906
Nobilis Avatar asked Mar 11 '16 16:03

Nobilis


Video Answer


2 Answers

The general "rule of the thumb" is as follows, if you encounter something in the form of

SUM_i f(x_i, y_i, ...) g(a_i, b_i, ...)

then you can easily vectorize it (and this is what is done in the above) through

f(x, y, ...)' * g(a, b, ...)

As this is just a typical dot product, which in mathematics (in Euclidean space of finite dimension) looks like

<A, B> = SUM_i A_i B_i = A'B

thus

(X * theta-y)' * X)

is just

<X * theta-y), X> = <H_theta(X) - y, X> = SUM_i (H_theta(X_i) - y_i) X_i

as you can see this works both ways, as this is just a mathematical definition of dot product.

like image 165
lejlot Avatar answered Oct 31 '22 22:10

lejlot


Referring to this part of your question specifically - "I'm not sure how to multiply the sum of the hypothesis of x(i) - y(i) by the all variables xj(i)."

In Octave you can multiply xj(i) to all the predictions using ".", so it can be written as:

m = size(X, 1);
predictions = X * theta;
sqrErrors = (predictions-y).^2;
J = 1 / (2*m) * sum(sqrErrors);
like image 41
srinivas Avatar answered Oct 31 '22 22:10

srinivas