I'm doing Andrew Ng's course on Machine Learning and I'm trying to wrap my head around the vectorised implementation of gradient descent for multiple variables which is an optional exercise in the course.
This is the algorithm in question (taken from here):
I just cannot do this in octave using sum
though, I'm not sure how to multiply the sum of the hypothesis of x(i) - y(i) by the all variables xj(i). I tried different iterations of the following code but to no avail (either the dimensions are not right or the answer is wrong):
theta = theta - alpha/m * sum(X * theta - y) * X;
The correct answer, however, is entirely non-obvious (to a linear algebra beginner like me anyway, from here):
theta = theta - (alpha/m * (X * theta-y)' * X)';
Is there a rule of thumb for cases where sum
is involved that governs transformations like the above?
And if so, is there the opposite version of the above (i.e. going from a sum
based solution to a general multiplication one) as I was able to come up with a correct implementation using sum
for gradient descent for a single variable (albeit not a very elegant one):
temp0 = theta(1) - (alpha/m * sum(X * theta - y));
temp1 = theta(2) - (alpha/m * sum((X * theta - y)' * X(:, 2)));
theta(1) = temp0;
theta(2) = temp1;
Please note that this only concerns vectorised implementations and although there are several questions on SO as to how this is done, my question is primarily concerned with the implementation of the algorithm in Octave using sum
.
The general "rule of the thumb" is as follows, if you encounter something in the form of
SUM_i f(x_i, y_i, ...) g(a_i, b_i, ...)
then you can easily vectorize it (and this is what is done in the above) through
f(x, y, ...)' * g(a, b, ...)
As this is just a typical dot product, which in mathematics (in Euclidean space of finite dimension) looks like
<A, B> = SUM_i A_i B_i = A'B
thus
(X * theta-y)' * X)
is just
<X * theta-y), X> = <H_theta(X) - y, X> = SUM_i (H_theta(X_i) - y_i) X_i
as you can see this works both ways, as this is just a mathematical definition of dot product.
Referring to this part of your question specifically - "I'm not sure how to multiply the sum of the hypothesis of x(i) - y(i) by the all variables xj(i)."
In Octave you can multiply xj(i) to all the predictions using ".", so it can be written as:
m = size(X, 1);
predictions = X * theta;
sqrErrors = (predictions-y).^2;
J = 1 / (2*m) * sum(sqrErrors);
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With