I really can't understand the following equation, especially <code>1/(2m)</code>. What's the purpose of this equation? And where does <code>1/(2m)</code> came from? <pre class="prettyprint"><code>J(theta_0, theta_1) = 1/(2m) * sum_(i=1)^m [ h_theta(x^i) - y^i ]^2 </code></pre> Please explain. How it casts???

The cost function is <pre class="prettyprint"><code>J(theta_0, theta_1) = 1/(2m) * sum_(i=1)^m [ h_theta(x^i) - y^i ]^2 </code></pre> By <code>h_theta(x^i)</code> we denote what model outputs for <code>x^i</code>, so <code>h_theta(x^i) - y^i</code> is its error (assuming, that <code>y^i</code> is a correct output). Now, we calculate the square of this error <code>[ h_theta(x^i) - y^i ]^2</code> (which removes the sign, as this error could be both positive and negative) and sum it over all samples, and to bound it somehow we normalize it - simply by dividing by <code>m</code>, so we have mean (because we devide by number of samples) squared (because we square) error (because we compute an error): <pre class="prettyprint"><code>1/m * sum_(i=1)^m [ h_theta(x^i) - y^i ]^2 </code></pre> This <code>2</code> which appears in the front is used only for simplification of the derivative, because when you will try to minimize it, you will use the steepest descent method, which is based on the derivative of this function. Derivative of <code>a^2</code> is <code>2a</code>, and our function is a square of something, so this <code>2</code> will cancel out. This is the only reason of its existance.

Can't understand the cost function for Linear Regression

Tags:

math

machine-learning

linear-algebra

I really can't understand the following equation, especially 1/(2m).

What's the purpose of this equation? And where does 1/(2m) came from?

J(theta_0, theta_1) = 1/(2m) * sum_(i=1)^m [ h_theta(x^i) - y^i ]^2

Please explain. How it casts???

860

asked Jan 13 '14 19:01

Faheem

1 Answers

The cost function is

J(theta_0, theta_1) = 1/(2m) * sum_(i=1)^m [ h_theta(x^i) - y^i ]^2

By h_theta(x^i) we denote what model outputs for x^i, so h_theta(x^i) - y^i is its error (assuming, that y^i is a correct output).

Now, we calculate the square of this error [ h_theta(x^i) - y^i ]^2 (which removes the sign, as this error could be both positive and negative) and sum it over all samples, and to bound it somehow we normalize it - simply by dividing by m, so we have mean (because we devide by number of samples) squared (because we square) error (because we compute an error):

1/m * sum_(i=1)^m [ h_theta(x^i) - y^i ]^2

This 2 which appears in the front is used only for simplification of the derivative, because when you will try to minimize it, you will use the steepest descent method, which is based on the derivative of this function. Derivative of a^2 is 2a, and our function is a square of something, so this 2 will cancel out. This is the only reason of its existance.

answered Nov 11 '22 02:11

lejlot

Related questions
                            
                                Area of rectangle-rectangle intersection
                            
                                python math domain errors in math.log function
                            
                                get point coordinates based on direction and distance (vector)
                            
                                Where to find algorithms for standard math functions? [closed]
                            
                                Optimize me! (C, performance) -- followup to bit-twiddling question
                            
                                Math used in 3D (Game) Engine Programming
                            
                                Calculating degrees between 2 points with inverse Y axis
                            
                                Does a range of integers contain at least one perfect square?
                            
                                Calculating if an angle is between two angles
                            
                                How to generate random numbers biased towards one value in a range?
                            
                                Sum of all numbers written with particular digits in a given range
                            
                                How do I find the orthogonal projection of a point onto a plane
                            
                                Value Remapping
                            
                                Arithmetic Overflow vs. Arithmetic Carry
                            
                                How do you normalize a zero vector
                            
                                Perpendicular on a line segment from a given point
                            
                                What is the correct way to raise an integer to a positive integer power in C++?
                            
                                Getting the fractional part of a float without using modf()
                            
                                How does modulus of a smaller dividend and larger divisor work?
                            
                                Matrix transforms; concepts and theory, are there any free resources for learning practically? [closed]

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With