Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Local and global minima of the cost function in logistic regression

I'm misunderstanding the idea behind the minima in the derivation of the logistic regression formula.

The idea is to increase the hypothesis as much as possible (i.e correct prediction probability close to 1 as possible), which in turn requires minimising the cost function $J(\theta)$ as much as possible.

Now I've been told that for this all to work, the cost function must be convex. My understanding of convexity requires there to be no maximums, and therefore there can only be one minimum, the global minimum. Is this really the case? If it's not, please explain why not. Also, if it's not the case, then that implies the possibility of multiple minima in the cost function, implying multiple sets of parameters yielding higher and higher probabilities. Is this possible? Or can I be certain the returned parameters refer to the global minima and hence highest probability/ prediction?

like image 594
Keir Simmons Avatar asked Oct 09 '16 13:10

Keir Simmons


People also ask

Does logistic regression have local minima?

Cost function in Logistic RegressionIn non convex function we get local minimum in addition to global minimum and finding global minimum will be a difficult task. Now using this cost function we can the global minima using Gradient descent.

How do you find the global minima of a cost function?

Substitute the value of x in the function and find the value where the function has either minimum values or maximum values. In order to find whether the point is local/global minima or maxima, take the second-order derivative and determine whether the value is positive or negative.

How is the cost function represented in logistic regression?

The cost function used in Logistic Regression is Log Loss.

What is local and global minima?

A local minimum of a function is a point where the function value is smaller than at nearby points, but possibly greater than at a distant point. A global minimum is a point where the function value is smaller than at all other feasible points.


1 Answers

The fact that we use convex cost function does not guarantee a convex problem.

There is a distinction between a convex cost function and a convex method.

The typical cost functions you encounter (cross entropy, absolute loss, least squares) are designed to be convex.

However, the convexity of the problem depends also on the type of ML algorithm you use.

Linear algorithms (linear regression, logistic regression etc) will give you convex solutions, that is they will converge. When using neural nets with hidden layers however, you are no longer guaranteed a convex solution.

Thus, convexity is a measure of describing your method not only your cost function!

LR is a linear classification method so you should get a convex optimization problem each time you use it! However, if the data is not linearly separable, it might not give a solution and it definitely won't give you a good solution in that case.

like image 135
Peter Dimmar Avatar answered Mar 27 '23 15:03

Peter Dimmar