Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Can't understand the cost function for Linear Regression

I really can't understand the following equation, especially 1/(2m).

What's the purpose of this equation? And where does 1/(2m) came from?

J(theta_0, theta_1) = 1/(2m) * sum_(i=1)^m [ h_theta(x^i) - y^i ]^2 

Please explain. How it casts???

like image 860
Faheem Avatar asked Jan 13 '14 19:01

Faheem


People also ask

How do you find the cost function in a linear regression?

For the Linear regression model, the cost function will be the minimum of the Root Mean Squared Error of the model, obtained by subtracting the predicted values from actual values. The cost function will be the minimum of these error values.

Why do we use cost function in linear regression?

The Cost Function of Linear Regression: Cost function measures how a machine learning model performs. Cost function is the calculation of the error between predicted values and actual values, represented as a single real number.

Why the cost function of linear regression Cannot be applied to the logistic regression model?

The hypothesis of logistic regression tends it to limit the cost function between 0 and 1 . Therefore linear functions fail to represent it as it can have a value greater than 1 or less than 0 which is not possible as per the hypothesis of logistic regression.

Is linear regression hard to understand?

In practical terms, linear regression is useful even if you are also using a more complex model for your work. The key is that linear regression is easy to understand and therefore easy to use to conceptually understand what is happening in more complex models.


1 Answers

The cost function is

J(theta_0, theta_1) = 1/(2m) * sum_(i=1)^m [ h_theta(x^i) - y^i ]^2 

By h_theta(x^i) we denote what model outputs for x^i, so h_theta(x^i) - y^i is its error (assuming, that y^i is a correct output).

Now, we calculate the square of this error [ h_theta(x^i) - y^i ]^2 (which removes the sign, as this error could be both positive and negative) and sum it over all samples, and to bound it somehow we normalize it - simply by dividing by m, so we have mean (because we devide by number of samples) squared (because we square) error (because we compute an error):

1/m * sum_(i=1)^m [ h_theta(x^i) - y^i ]^2 

This 2 which appears in the front is used only for simplification of the derivative, because when you will try to minimize it, you will use the steepest descent method, which is based on the derivative of this function. Derivative of a^2 is 2a, and our function is a square of something, so this 2 will cancel out. This is the only reason of its existance.

like image 84
lejlot Avatar answered Nov 11 '22 02:11

lejlot