EDIT: I already made significant progress. My current question is written after my last edit below and can be answered without the context.
I currently follow Andrew Ng's Machine Learning Course on Coursera and tried to implement logistic regression today.
Notation:
X
is a (m x n)
-matrix with vectors of input variables as rows (m
training samples of n-1
variables, the entries of the first column are equal to 1 everywhere to represent a constant).y
is the corresponding vector of expected output samples (column vector with m
entries equal to 0
or 1
)theta
is the vector of model coefficients (row vector with n
entries)For an input row vector x
the model will predict the probability sigmoid(x * theta.T)
for a positive outcome.
This is my Python3/numpy implementation:
import numpy as np
def sigmoid(x):
return 1 / (1 + np.exp(-x))
vec_sigmoid = np.vectorize(sigmoid)
def logistic_cost(X, y, theta):
summands = np.multiply(y, np.log(vec_sigmoid(X*theta.T))) + np.multiply(1 - y, np.log(1 - vec_sigmoid(X*theta.T)))
return - np.sum(summands) / len(y)
def gradient_descent(X, y, learning_rate, num_iterations):
num_parameters = X.shape[1] # dim theta
theta = np.matrix([0.0 for i in range(num_parameters)]) # init theta
cost = [0.0 for i in range(num_iterations)]
for it in range(num_iterations):
error = np.repeat(vec_sigmoid(X * theta.T) - y, num_parameters, axis=1)
error_derivative = np.sum(np.multiply(error, X), axis=0)
theta = theta - (learning_rate / len(y)) * error_derivative
cost[it] = logistic_cost(X, y, theta)
return theta, cost
This implementation seems to work fine, but I encountered a problem when calculating the logistic-cost. At some point the gradient descent algorithm converges to a pretty good fitting theta
and the following happens:
For some input row X_i
with expected outcome 1
X * theta.T
will become positive with a good margin (for example 23.207
). This will lead to sigmoid(X_i * theta)
to become exactly 1.0000
(this is because of lost precision I think). This is a good prediction (since the expected outcome is equal to 1
), but this breaks the calculation of the logistic cost, since np.log(1 - vec_sigmoid(X*theta.T))
will evaluate to NaN
. This shouldn't be a problem, since the term is multiplied with 1 - y = 0
, but once a value of NaN
occurs, the whole calculation is broken (0 * NaN = NaN
).
How should I handle this in the vectorized implementation, since np.multiply(1 - y, np.log(1 - vec_sigmoid(X*theta.T)))
is calculated in every row of X
(not only where y = 0
)?
Example input:
X = np.matrix([[1. , 0. , 0. ],
[1. , 1. , 0. ],
[1. , 0. , 1. ],
[1. , 0.5, 0.3],
[1. , 1. , 0.2]])
y = np.matrix([[0],
[1],
[1],
[0],
[1]])
Then theta, _ = gradient_descent(X, y, 10000, 10000)
(yes, in this case we can set the learning rate this large) will set theta
as:
theta = np.matrix([[-3000.04008972, 3499.97995514, 4099.98797308]])
This will lead to vec_sigmoid(X * theta.T)
to be the really good prediction of:
np.matrix([[0.00000000e+00], # 0
[1.00000000e+00], # 1
[1.00000000e+00], # 1
[1.95334953e-09], # nearly zero
[1.00000000e+00]]) # 1
but logistic_cost(X, y, theta)
evaluates to NaN
.
EDIT:
I came up with the following solution. I just replaced the logistic_cost
function with:
def new_logistic_cost(X, y, theta):
term1 = vec_sigmoid(X*theta.T)
term1[y == 0] = 1
term2 = 1 - vec_sigmoid(X*theta.T)
term2[y == 1] = 1
summands = np.multiply(y, np.log(term1)) + np.multiply(1 - y, np.log(term2))
return - np.sum(summands) / len(y)
By using the mask I just calculate log(1)
at the places at which the result will be multiplied with zero anyway. Now log(0)
will only happen in wrong implementations of gradient descent.
Open questions: How can I make this solution more clean? Is it possible to achieve a similar effect in a cleaner way?
If you don't mind using SciPy, you could import expit
and xlog1py
from scipy.special
:
from scipy.special import expit, xlog1py
and replace the expression
np.multiply(1 - y, np.log(1 - vec_sigmoid(X*theta.T)))
with
xlog1py(1 - y, -expit(X*theta.T))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With