I implemented a gradient descent algorithm to minimize a cost function in order to gain a hypothesis for determining whether an image has a good quality. I did that in Octave. The idea is somehow based on the algorithm from the machine learning class by Andrew Ng Therefore I have 880 values "y" that contains values from 0.5 to ~12. And I have 880 values from 50 to 300 in "X" that should predict the image's quality. Sadly the algorithm seems to fail, after some iterations the value for theta is so small, that theta0 and theta1 become "NaN". And my linear regression curve has strange values... here is the code for the gradient descent algorithm: (<code>theta = zeros(2, 1);</code>, alpha= 0.01, iterations=1500) <pre class="prettyprint"><code>function [theta, J_history] = gradientDescent(X, y, theta, alpha, num_iters) m = length(y); % number of training examples J_history = zeros(num_iters, 1); for iter = 1:num_iters tmp_j1=0; for i=1:m, tmp_j1 = tmp_j1+ ((theta (1,1) + theta (2,1)*X(i,2)) - y(i)); end tmp_j2=0; for i=1:m, tmp_j2 = tmp_j2+ (((theta (1,1) + theta (2,1)*X(i,2)) - y(i)) *X(i,2)); end tmp1= theta(1,1) - (alpha * ((1/m) * tmp_j1)) tmp2= theta(2,1) - (alpha * ((1/m) * tmp_j2)) theta(1,1)=tmp1 theta(2,1)=tmp2 % ============================================================ % Save the cost J in every iteration J_history(iter) = computeCost(X, y, theta); end end </code></pre> And here is the computation for the costfunction: <pre class="prettyprint"><code>function J = computeCost(X, y, theta) % m = length(y); % number of training examples J = 0; tmp=0; for i=1:m, tmp = tmp+ (theta (1,1) + theta (2,1)*X(i,2) - y(i))^2; %differenzberechnung end J= (1/(2*m)) * tmp end </code></pre>

If you are wondering how the seemingly complex looking <code>for</code> loop can be vectorized and cramped into a single one line expression, then please read on. The vectorized form is: <code>theta = theta - (alpha/m) * (X' * (X * theta - y))</code> Given below is a detailed explanation for how we arrive at this vectorized expression using gradient descent algorithm: This is the gradient descent algorithm to fine tune the value of θ: <img src="https://i.stack.imgur.com/kFsJC.png" alt="enter image description here"> Assume that the following values of X, y and θ are given: <ul> <li>m = number of training examples</li> <li>n = number of features + 1</li> </ul> <img src="https://i.stack.imgur.com/Cdilc.png" alt="enter image description here"> Here <ul> <li>m = 5 (training examples)</li> <li>n = 4 (features+1)</li> <li>X = m x n matrix</li> <li>y = m x 1 vector matrix</li> <li>θ = n x 1 vector matrix</li> <li>xi is the ith training example</li> <li>xj is the jth feature in a given training example</li> </ul> Further, <ul> <li> <code>h(x) = ([X] * [θ])</code> (m x 1 matrix of predicted values for our training set) </li> <li> <code>h(x)-y = ([X] * [θ] - [y])</code> (m x 1 matrix of Errors in our predictions)</li> </ul> whole objective of machine learning is to minimize Errors in predictions. Based on the above corollary, our Errors matrix is <code>m x 1</code> vector matrix as follows: <img src="https://i.stack.imgur.com/W8ath.png" alt="enter image description here"> To calculate new value of θj, we have to get a summation of all errors (m rows) multiplied by jth feature value of the training set X. That is, take all the values in E, individually multiply them with jth feature of the corresponding training example, and add them all together. This will help us in getting the new (and hopefully better) value of θj. Repeat this process for all j or the number of features. In matrix form, this can be written as: <img src="https://i.stack.imgur.com/hqhDl.png" alt="enter image description here"> This can be simplified as: <img src="https://i.stack.imgur.com/TZ96A.png" alt="enter image description here"> <ul> <li> <code>[E]' x [X]</code> will give us a row vector matrix, since E' is 1 x m matrix and X is m x n matrix. But we are interested in getting a column matrix, hence we transpose the resultant matrix.</li> </ul> More succinctly, it can be written as: <img src="https://i.stack.imgur.com/pzwQL.png" alt="enter image description here"> Since <code>(A * B)' = (B' * A')</code>, and <code>A'' = A</code>, we can also write the above as <img src="https://i.stack.imgur.com/z0xA3.png" alt="enter image description here"> This is the original expression we started out with: <pre class="prettyprint"><code>theta = theta - (alpha/m) * (X' * (X * theta - y)) </code></pre>

i vectorized the theta thing... may could help somebody <pre class="prettyprint"><code>theta = theta - (alpha/m * (X * theta-y)' * X)'; </code></pre>

gradient descent seems to fail

Tags:

machine-learning

octave

gradient-descent

I implemented a gradient descent algorithm to minimize a cost function in order to gain a hypothesis for determining whether an image has a good quality. I did that in Octave. The idea is somehow based on the algorithm from the machine learning class by Andrew Ng

Therefore I have 880 values "y" that contains values from 0.5 to ~12. And I have 880 values from 50 to 300 in "X" that should predict the image's quality.

Sadly the algorithm seems to fail, after some iterations the value for theta is so small, that theta0 and theta1 become "NaN". And my linear regression curve has strange values...

here is the code for the gradient descent algorithm: (theta = zeros(2, 1);, alpha= 0.01, iterations=1500)

function [theta, J_history] = gradientDescent(X, y, theta, alpha, num_iters)  m = length(y); % number of training examples J_history = zeros(num_iters, 1);  for iter = 1:num_iters       tmp_j1=0; for i=1:m,      tmp_j1 = tmp_j1+ ((theta (1,1) + theta (2,1)*X(i,2)) - y(i)); end      tmp_j2=0; for i=1:m,      tmp_j2 = tmp_j2+ (((theta (1,1) + theta (2,1)*X(i,2)) - y(i)) *X(i,2));  end      tmp1= theta(1,1) - (alpha *  ((1/m) * tmp_j1))       tmp2= theta(2,1) - (alpha *  ((1/m) * tmp_j2))        theta(1,1)=tmp1     theta(2,1)=tmp2      % ============================================================      % Save the cost J in every iteration         J_history(iter) = computeCost(X, y, theta); end end

And here is the computation for the costfunction:

function J = computeCost(X, y, theta)   %  m = length(y); % number of training examples J = 0; tmp=0; for i=1:m,      tmp = tmp+ (theta (1,1) + theta (2,1)*X(i,2) - y(i))^2; %differenzberechnung end J= (1/(2*m)) * tmp end

202

asked May 07 '12 09:05

Tyzak

2 Answers

If you are wondering how the seemingly complex looking for loop can be vectorized and cramped into a single one line expression, then please read on. The vectorized form is:

theta = theta - (alpha/m) * (X' * (X * theta - y))

Given below is a detailed explanation for how we arrive at this vectorized expression using gradient descent algorithm:

This is the gradient descent algorithm to fine tune the value of θ: enter image description here

Assume that the following values of X, y and θ are given:

m = number of training examples
n = number of features + 1

enter image description here

Here

m = 5 (training examples)
n = 4 (features+1)
X = m x n matrix
y = m x 1 vector matrix
θ = n x 1 vector matrix
xⁱ is the i^th training example
x_j is the j^th feature in a given training example

Further,

h(x) = ([X] * [θ]) (m x 1 matrix of predicted values for our training set)
h(x)-y = ([X] * [θ] - [y]) (m x 1 matrix of Errors in our predictions)

whole objective of machine learning is to minimize Errors in predictions. Based on the above corollary, our Errors matrix is m x 1 vector matrix as follows:

enter image description here

To calculate new value of θ_j, we have to get a summation of all errors (m rows) multiplied by j^th feature value of the training set X. That is, take all the values in E, individually multiply them with j^th feature of the corresponding training example, and add them all together. This will help us in getting the new (and hopefully better) value of θ_j. Repeat this process for all j or the number of features. In matrix form, this can be written as:

enter image description here

This can be simplified as: enter image description here

[E]' x [X] will give us a row vector matrix, since E' is 1 x m matrix and X is m x n matrix. But we are interested in getting a column matrix, hence we transpose the resultant matrix.

More succinctly, it can be written as: enter image description here

Since (A * B)' = (B' * A'), and A'' = A, we can also write the above as

enter image description here

This is the original expression we started out with:

theta = theta - (alpha/m) * (X' * (X * theta - y))

answered Sep 18 '22 22:09

jerrymouse

i vectorized the theta thing... may could help somebody

theta = theta - (alpha/m *  (X * theta-y)' * X)';

answered Sep 22 '22 22:09

Markus

Related questions
                            
                                RuntimeError: Attempting to deserialize object on a CUDA device
                            
                                How to detect how similar a speech recording is to another speech recording?
                            
                                Make predictions using a tensorflow graph from a keras model
                            
                                Why neural network predicts wrong on its own training data?
                            
                                How is Elastic Net used?
                            
                                Error in Confusion Matrix : the data and reference factors must have the same number of levels
                            
                                How does binary cross entropy loss work on autoencoders?
                            
                                How to find the features names of the coefficients using scikit linear regression?
                            
                                NLTK for Named Entity Recognition
                            
                                Large scale Machine Learning [closed]
                            
                                Label Smoothing in PyTorch
                            
                                What is the difference between cross-entropy and log loss error?
                            
                                What is the default weight initializer in Keras?
                            
                                How to apply LabelEncoder for a specific column in Pandas dataframe
                            
                                Simple multi layer neural network implementation [closed]
                            
                                Data augmentation in test/validation set?
                            
                                Keep TFIDF result for predicting new content using Scikit for Python
                            
                                ValueError: feature_names mismatch: in xgboost in the predict() function
                            
                                Can't understand the cost function for Linear Regression
                            
                                XGBoost plot_importance doesn't show feature names

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

gradient descent seems to fail

Tags:

machine-learning

octave

gradient-descent

Tyzak

People also ask

2 Answers

jerrymouse

Markus

Recent Activity

Donate For Us