Why the cost function of logistic regression has a logarithmic expression?

Tags:

cost function for the logistic regression is

cost(h(theta)X,Y) = -log(h(theta)X) or -log(1-h(theta)X)

My question is what is the base of putting the logarithmic expression for cost function .Where does it come from? i believe you can't just put "-log" out of nowhere. If someone could explain derivation of the cost function i would be grateful. thank you.

521

asked Oct 07 '15 07:10

Nipun Alahakoon

2 Answers

This cost function is simply a reformulation of the maximum-(log-)likelihood criterion.

The model of the logistic regression is:

P(y=1 | x) = logistic(θ x) P(y=0 | x) = 1 - P(y=1 | x) = 1 - logistic(θ x)

The likelihood is written as:

L = P(y_0, ..., y_n | x_0, ..., x_n) = \prod_i P(y_i | x_i)

The log-likelihood is:

l = log L = \sum_i log P(y_i | x_i)

We want to find θ which maximizes the likelihood:

max_θ \prod_i P(y_i | x_i)

This is the same as maximizing the log-likelihood:

max_θ \sum_i log P(y_i | x_i)

We can rewrite this as a minimization of the cost C=-l:

min_θ \sum_i - log P(y_i | x_i)   P(y_i | x_i) = logistic(θ x_i)      when y_i = 1   P(y_i | x_i) = 1 - logistic(θ x_i)  when y_i = 0

answered Sep 21 '22 11:09

ysdx

Source: my own notes taken during Standford's Machine Learning course in Coursera, by Andrew Ng. All credits to him and this organization. The course is freely available for anybody to be taken at their own pace. The images are made by myself using LaTeX (formulas) and R (graphics).

Hypothesis function

Logistic regression is used when the variable y that is wanted to be predicted can only take discrete values (i.e.: classification).

Considering a binary classification problem (y can only take two values), then having a set of parameters θ and set of input features x, the hypothesis function could be defined so that is bounded between [0, 1], in which g() represents the sigmoid function:

enter image description here

This hypothesis function represents at the same time the estimated probability that y = 1 on input x parameterized by θ:

enter image description here

Cost function

The cost function represents the optimization objective.

enter image description here

Although a possible definition of the cost function could be the mean of the Euclidean distance between the hypothesis h_θ(x) and the actual value y among all the m samples in the training set, as long as the hypothesis function is formed with the sigmoid function, this definition would result in a non-convex cost function, which means that a local minimum could be easily found before reaching the global minimum. In order to ensure the cost function is convex (and therefore ensure convergence to the global minimum), the cost function is transformed using the logarithm of the sigmoid function.

enter image description here

This way the optimization objective function can be defined as the mean of the costs/errors in the training set:

enter image description here

136

answered Sep 23 '22 11:09

Peque

Related questions
                            
                                Keras error : Expected to see 1 array
                            
                                Why does sklearn Imputer need to fit?
                            
                                Tensor is not an element of this graph
                            
                                What's the difference between LSTM() and LSTMCell()?
                            
                                Is there a better way to guess possible unknown variables without brute force than I am doing? Machine learning? [duplicate]
                            
                                What is the meaning of the nu parameter in Scikit-Learn's SVM class?
                            
                                keras BatchNormalization axis clarification
                            
                                How to disable dropout while prediction in keras?
                            
                                ValueError: Variable rnn/basic_rnn_cell/kernel already exists, disallowed. Did you mean to set reuse=True or reuse=tf.AUTO_REUSE in VarScope?
                            
                                Clustering Algorithm for Mapping Application
                            
                                Batch normalization instead of input normalization
                            
                                Tensorflow mean squared error loss function
                            
                                How does mask_zero in Keras Embedding layer work?
                            
                                Unit Testing Machine Learning Code
                            
                                What is OOF approach in machine learning?
                            
                                Difference between Dense and Activation layer in Keras
                            
                                Show training and validation accuracy in TensorFlow using same graph
                            
                                Difference between cross_val_score and cross_val_predict
                            
                                Difference between parameters, features and class in Machine Learning
                            
                                Tensorflow Keras Copy Weights From One Model to Another

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Why the cost function of logistic regression has a logarithmic expression?

Tags:

machine-learning

logarithm

logistic-regression

Nipun Alahakoon

People also ask

2 Answers

ysdx

Hypothesis function

Cost function

Peque

Recent Activity

Donate For Us