Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Logistic regression in Julia using Optim.jl

I'm trying to implement a simple regularized logistic regression algorithm in Julia. I'd like to use Optim.jl library to minimize my cost function, but I can't get it to work.

My cost function and gradient are as follows:

function cost(X, y, theta, lambda)
    m = length(y)
    h = sigmoid(X * theta)
    reg = (lambda / (2*m)) * sum(theta[2:end].^2)
    J = (1/m) * sum( (-y).*log(h) - (1-y).*log(1-h) ) + reg
    return J
end

function grad(X, y, theta, lambda, gradient)
    m = length(y)
    h = sigmoid(X * theta)
    # gradient = zeros(size(theta))
    gradient = (1/m) * X' * (h - y)
    gradient[2:end] = gradient[2:end] + (lambda/m) * theta[2:end]
    return gradient
end

(Where theta is a vector of parameters for the hypothesis function and lambda is a regularization parameter.)

Then, according to the instructions given here: https://github.com/JuliaOpt/Optim.jl I try to call the optimization function like this:

# those are handle functions I define to pass them as arguments:
c(theta::Vector) = cost(X, y, theta, lambda)
g!(theta::Vector, gradient::Vector) = grad(X, y, theta, lambda, gradient)

# then I do
optimize(c,some_initial_theta) 
# or maybe
optimize(c,g!,initial_theta,method = :l_bfgs) # try a different algorithm

In both cases it says that it fails to converge and the output looks kind of awkard:

julia> optimize(c,initial_theta)
Results of Optimization Algorithm
 * Algorithm: Nelder-Mead
 * Starting Point: [0.0,0.0,0.0,0.0,0.0]
 * Minimum: [1.7787162051775145,3.4584135105727145,-6.659680628594007,4.776952006060713,1.5034743945407143]
 * Value of Function at Minimum: -Inf
 * Iterations: 1000
 * Convergence: false
   * |x - x'| < NaN: false
   * |f(x) - f(x')| / |f(x)| < 1.0e-08: false
   * |g(x)| < NaN: false
   * Exceeded Maximum Number of Iterations: true
 * Objective Function Calls: 1013
 * Gradient Call: 0

julia> optimize(c,g!,initial_theta,method = :l_bfgs)
Results of Optimization Algorithm
 * Algorithm: L-BFGS
 * Starting Point: [0.0,0.0,0.0,0.0,0.0]
 * Minimum: [-6.7055e-320,-2.235e-320,-6.7055e-320,-2.244e-320,-6.339759952602652e-7]
 * Value of Function at Minimum: 0.693148
 * Iterations: 1
 * Convergence: false
   * |x - x'| < 1.0e-32: false
   * |f(x) - f(x')| / |f(x)| < 1.0e-08: false
   * |g(x)| < 1.0e-08: false
   * Exceeded Maximum Number of Iterations: false
 * Objective Function Calls: 75
 * Gradient Call: 75

Question

Is my method (from my first code listing) incorrect? Or am I misusing Optim.jl functions? Either way, what is the proper way to define and minimize the cost function here?

It's my first time with Julia and probably I'm doing something terribly wrong, but I can't tell what exactly. Any help will be appreciated!

EDIT

X and y are the training set, X is a 90x5 matrix, y a 90x1 vector (namely, my training set is taken from Iris - I don't think it matters).

like image 485
machaerus Avatar asked Sep 21 '15 15:09

machaerus


2 Answers

Below you find my cost and gradient computation functions for Logistic Regression using closures and currying (version for those who got used to a function that returns the cost and gradient):

function cost_gradient(θ, X, y, λ)
    m = length(y)
    return (θ::Array) -> begin 
        h = sigmoid(X * θ)   
        J = (1 / m) * sum(-y .* log(h) .- (1 - y) .* log(1 - h)) + λ / (2 * m) * sum(θ[2:end] .^ 2)         
    end, (θ::Array, storage::Array) -> begin  
        h = sigmoid(X * θ) 
        storage[:] = (1 / m) * (X' * (h .- y)) + (λ / m) * [0; θ[2:end]]        
    end
end

Sigmoid function implementation:

sigmoid(z) = 1.0 ./ (1.0 + exp(-z))

To apply cost_gradient in Optim.jl do the following:

using Optim
#...
# Prerequisites:
# X size is (m,d), where d is the number of training set features
# y size is (m,1)
# λ as the regularization parameter, e.g 1.5
# ITERATIONS number of iterations, e.g. 1000
X=[ones(size(X,1)) X] #add x_0=1.0 column; now X size is (m,d+1)
initialθ = zeros(size(X,2),1) #initialTheta size is (d+1, 1)
cost, gradient! = cost_gradient(initialθ, X, y, λ)
res = optimize(cost, gradient!, initialθ, method = ConjugateGradient(), iterations = ITERATIONS);
θ = Optim.minimizer(res);

Now, you can easily predict (e.g. training set validation):

predictions = sigmoid(X * θ) #X size is (m,d+1)

Either try my approach or compare it with your implementation.

like image 120
Maciek Leks Avatar answered Nov 13 '22 19:11

Maciek Leks


Here is an example of unregularized logistic regression that uses the autodifferentiation functionality of Optim.jl. It might help you with your own implementation.

using Optim

const X = rand(100, 3)
const true_β = [5,2,4]
const true_y =  1 ./ (1 + exp(-X*true_β))

function objective(β)
    y = 1 ./ (1 + exp(-X*β))
    return sum((y - true_y).^2)  # Use SSE, non-standard for log. reg.
end

println(optimize(objective, [3.0,3.0,3.0],
                autodiff=true, method=LBFGS()))

Which gives me

Results of Optimization Algorithm
 * Algorithm: L-BFGS
 * Starting Point: [3.0,3.0,3.0]
 * Minimizer: [4.999999945789497,1.9999999853962256,4.0000000047769495]
 * Minimum: 0.000000
 * Iterations: 14
 * Convergence: true
   * |x - x'| < 1.0e-32: false
   * |f(x) - f(x')| / |f(x)| < 1.0e-08: false
   * |g(x)| < 1.0e-08: true
   * Exceeded Maximum Number of Iterations: false
 * Objective Function Calls: 53
 * Gradient Call: 53
like image 27
IainDunning Avatar answered Nov 13 '22 20:11

IainDunning