How to avoid NaN in numpy implementation of logistic regression?

Question

EDIT: I already made significant progress. My current question is written after my last edit below and can be answered without the context.

I currently follow Andrew Ng's Machine Learning Course on Coursera and tried to implement logistic regression today.

Notation:

X is a (m x n)-matrix with vectors of input variables as rows (m training samples of n-1 variables, the entries of the first column are equal to 1 everywhere to represent a constant).
y is the corresponding vector of expected output samples (column vector with m entries equal to 0 or 1)
theta is the vector of model coefficients (row vector with n entries)

For an input row vector x the model will predict the probability sigmoid(x * theta.T) for a positive outcome.

This is my Python3/numpy implementation:

import numpy as np

def sigmoid(x):
    return 1 / (1 + np.exp(-x))

vec_sigmoid = np.vectorize(sigmoid)

def logistic_cost(X, y, theta):
    summands = np.multiply(y, np.log(vec_sigmoid(X*theta.T))) + np.multiply(1 - y, np.log(1 - vec_sigmoid(X*theta.T)))
    return - np.sum(summands) / len(y)


def gradient_descent(X, y, learning_rate, num_iterations):
    num_parameters = X.shape[1]                                 # dim theta
    theta = np.matrix([0.0 for i in range(num_parameters)])     # init theta
    cost = [0.0 for i in range(num_iterations)]

    for it in range(num_iterations):
        error = np.repeat(vec_sigmoid(X * theta.T) - y, num_parameters, axis=1)
        error_derivative = np.sum(np.multiply(error, X), axis=0)
        theta = theta - (learning_rate / len(y)) * error_derivative
        cost[it] = logistic_cost(X, y, theta)

    return theta, cost

This implementation seems to work fine, but I encountered a problem when calculating the logistic-cost. At some point the gradient descent algorithm converges to a pretty good fitting theta and the following happens:

For some input row X_i with expected outcome 1 X * theta.T will become positive with a good margin (for example 23.207). This will lead to sigmoid(X_i * theta) to become exactly 1.0000 (this is because of lost precision I think). This is a good prediction (since the expected outcome is equal to 1), but this breaks the calculation of the logistic cost, since np.log(1 - vec_sigmoid(X*theta.T)) will evaluate to NaN. This shouldn't be a problem, since the term is multiplied with 1 - y = 0, but once a value of NaN occurs, the whole calculation is broken (0 * NaN = NaN).

How should I handle this in the vectorized implementation, since np.multiply(1 - y, np.log(1 - vec_sigmoid(X*theta.T))) is calculated in every row of X (not only where y = 0)?

Example input:

X = np.matrix([[1. , 0. , 0. ],
               [1. , 1. , 0. ],
               [1. , 0. , 1. ],
               [1. , 0.5, 0.3],
               [1. , 1. , 0.2]])

y = np.matrix([[0],
               [1],
               [1],
               [0],
               [1]])

Then theta, _ = gradient_descent(X, y, 10000, 10000) (yes, in this case we can set the learning rate this large) will set theta as:

theta = np.matrix([[-3000.04008972,  3499.97995514,  4099.98797308]])

This will lead to vec_sigmoid(X * theta.T) to be the really good prediction of:

np.matrix([[0.00000000e+00],      # 0
           [1.00000000e+00],      # 1
           [1.00000000e+00],      # 1
           [1.95334953e-09],      # nearly zero
           [1.00000000e+00]])     # 1

but logistic_cost(X, y, theta) evaluates to NaN.

EDIT:

I came up with the following solution. I just replaced the logistic_cost function with:

def new_logistic_cost(X, y, theta):
    term1 = vec_sigmoid(X*theta.T)
    term1[y == 0] = 1
    term2 = 1 - vec_sigmoid(X*theta.T)
    term2[y == 1] = 1
    summands = np.multiply(y, np.log(term1)) + np.multiply(1 - y, np.log(term2))
    return - np.sum(summands) / len(y)

By using the mask I just calculate log(1) at the places at which the result will be multiplied with zero anyway. Now log(0) will only happen in wrong implementations of gradient descent.

Open questions: How can I make this solution more clean? Is it possible to achieve a similar effect in a cleaner way?

Warren Weckesser · Accepted Answer

If you don't mind using SciPy, you could import expit and xlog1py from scipy.special:

from scipy.special import expit, xlog1py

and replace the expression

np.multiply(1 - y, np.log(1 - vec_sigmoid(X*theta.T)))

with

xlog1py(1 - y, -expit(X*theta.T))

How to avoid NaN in numpy implementation of logistic regression?

Tags:

python-3.x

nan

machine-learning

numpy

logistic-regression

s1624210

1 Answers

Warren Weckesser

Recent Activity

Donate For Us

How to avoid NaN in numpy implementation of logistic regression?

Tags:

python-3.x

nan

machine-learning

numpy

logistic-regression

s1624210

1 Answers

Warren Weckesser

Related questions

Recent Activity

Donate For Us