I am coding gradient descent in matlab. For two features, I get for the update step:
temp0 = theta(1,1) - (alpha/m)*sum((X*theta-y).*X(:,1)); temp1 = theta(2,1) - (alpha/m)*sum((X*theta-y).*X(:,2)); theta(1,1) = temp0; theta(2,1) = temp1;
However, I want to vectorize this code and to be able to apply it to any number of features. For the vectorization part, it shows that what I am trying to do is a matrix multiplication
theta = theta - (alpha/m) * (X' * (X*theta-y));
This is well seen, but when I tried, I realized that it doesn't work for gradient descent because the parameters are not updated simultaneously.
Then, how can I vectorize this code and make sure the parameters and updated at the same time?
Gradient Descent Algorithm iteratively calculates the next point using gradient at the current position, scales it (by a learning rate) and subtracts obtained value from the current position (makes a step). It subtracts the value because we want to minimise the function (to maximise it would be adding).
Vectorization is the process of converting an algorithm from operating on a single value at a time to operating on a set of values (vector) at one time. Modern CPUs provide direct support for vector operations where a single instruction is applied to multiple data (SIMD).
In Machine Learning, vectorization is a step in feature extraction. The idea is to get some distinct features out of the text for the model to train on, by converting text to numerical vectors.
The gradient descent procedure is an algorithm for finding the minimum of a function. Suppose we have a function f(x), where x is a tuple of several variables,i.e., x = (x_1, x_2, … x_n). Also, suppose that the gradient of f(x) is given by ∇f(x). We want to find the value of the variables (x_1, x_2, …
For the vectorized version try the following(two steps to make simultaneous update explicitly) :
gradient = (alpha/m) * X' * (X*theta -y) theta = theta - gradient
Your vectorization is correct. I also tried both of your code, and it got me the same theta. Just remember don't use your updated theta in your second implementation.
This also works but less simplified than your 2nd implementation:
Error = X * theta - y; for i = 1:2 S(i) = sum(Error.*X(:,i)); end theta = theta - alpha * (1/m) * S'
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With