I'm misunderstanding the idea behind the minima in the derivation of the logistic regression formula.
The idea is to increase the hypothesis as much as possible (i.e correct prediction probability close to 1 as possible), which in turn requires minimising the cost function $J(\theta)$ as much as possible.
Now I've been told that for this all to work, the cost function must be convex. My understanding of convexity requires there to be no maximums, and therefore there can only be one minimum, the global minimum. Is this really the case? If it's not, please explain why not. Also, if it's not the case, then that implies the possibility of multiple minima in the cost function, implying multiple sets of parameters yielding higher and higher probabilities. Is this possible? Or can I be certain the returned parameters refer to the global minima and hence highest probability/ prediction?
Cost function in Logistic RegressionIn non convex function we get local minimum in addition to global minimum and finding global minimum will be a difficult task. Now using this cost function we can the global minima using Gradient descent.
Substitute the value of x in the function and find the value where the function has either minimum values or maximum values. In order to find whether the point is local/global minima or maxima, take the second-order derivative and determine whether the value is positive or negative.
The cost function used in Logistic Regression is Log Loss.
A local minimum of a function is a point where the function value is smaller than at nearby points, but possibly greater than at a distant point. A global minimum is a point where the function value is smaller than at all other feasible points.
The fact that we use convex cost function does not guarantee a convex problem.
There is a distinction between a convex cost function and a convex method.
The typical cost functions you encounter (cross entropy, absolute loss, least squares) are designed to be convex.
However, the convexity of the problem depends also on the type of ML algorithm you use.
Linear algorithms (linear regression, logistic regression etc) will give you convex solutions, that is they will converge. When using neural nets with hidden layers however, you are no longer guaranteed a convex solution.
Thus, convexity is a measure of describing your method not only your cost function!
LR is a linear classification method so you should get a convex optimization problem each time you use it! However, if the data is not linearly separable, it might not give a solution and it definitely won't give you a good solution in that case.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With