I'm trying to implement a simple regularized logistic regression algorithm in Julia. I'd like to use Optim.jl library to minimize my cost function, but I can't get it to work.
My cost function and gradient are as follows:
function cost(X, y, theta, lambda)
m = length(y)
h = sigmoid(X * theta)
reg = (lambda / (2*m)) * sum(theta[2:end].^2)
J = (1/m) * sum( (-y).*log(h) - (1-y).*log(1-h) ) + reg
return J
end
function grad(X, y, theta, lambda, gradient)
m = length(y)
h = sigmoid(X * theta)
# gradient = zeros(size(theta))
gradient = (1/m) * X' * (h - y)
gradient[2:end] = gradient[2:end] + (lambda/m) * theta[2:end]
return gradient
end
(Where theta
is a vector of parameters for the hypothesis function and lambda
is a regularization parameter.)
Then, according to the instructions given here: https://github.com/JuliaOpt/Optim.jl I try to call the optimization function like this:
# those are handle functions I define to pass them as arguments:
c(theta::Vector) = cost(X, y, theta, lambda)
g!(theta::Vector, gradient::Vector) = grad(X, y, theta, lambda, gradient)
# then I do
optimize(c,some_initial_theta)
# or maybe
optimize(c,g!,initial_theta,method = :l_bfgs) # try a different algorithm
In both cases it says that it fails to converge and the output looks kind of awkard:
julia> optimize(c,initial_theta)
Results of Optimization Algorithm
* Algorithm: Nelder-Mead
* Starting Point: [0.0,0.0,0.0,0.0,0.0]
* Minimum: [1.7787162051775145,3.4584135105727145,-6.659680628594007,4.776952006060713,1.5034743945407143]
* Value of Function at Minimum: -Inf
* Iterations: 1000
* Convergence: false
* |x - x'| < NaN: false
* |f(x) - f(x')| / |f(x)| < 1.0e-08: false
* |g(x)| < NaN: false
* Exceeded Maximum Number of Iterations: true
* Objective Function Calls: 1013
* Gradient Call: 0
julia> optimize(c,g!,initial_theta,method = :l_bfgs)
Results of Optimization Algorithm
* Algorithm: L-BFGS
* Starting Point: [0.0,0.0,0.0,0.0,0.0]
* Minimum: [-6.7055e-320,-2.235e-320,-6.7055e-320,-2.244e-320,-6.339759952602652e-7]
* Value of Function at Minimum: 0.693148
* Iterations: 1
* Convergence: false
* |x - x'| < 1.0e-32: false
* |f(x) - f(x')| / |f(x)| < 1.0e-08: false
* |g(x)| < 1.0e-08: false
* Exceeded Maximum Number of Iterations: false
* Objective Function Calls: 75
* Gradient Call: 75
Is my method (from my first code listing) incorrect? Or am I misusing Optim.jl functions? Either way, what is the proper way to define and minimize the cost function here?
It's my first time with Julia and probably I'm doing something terribly wrong, but I can't tell what exactly. Any help will be appreciated!
X
and y
are the training set, X
is a 90x5 matrix, y
a 90x1 vector (namely, my training set is taken from Iris - I don't think it matters).
Below you find my cost and gradient computation functions for Logistic Regression using closures and currying (version for those who got used to a function that returns the cost and gradient):
function cost_gradient(θ, X, y, λ)
m = length(y)
return (θ::Array) -> begin
h = sigmoid(X * θ)
J = (1 / m) * sum(-y .* log(h) .- (1 - y) .* log(1 - h)) + λ / (2 * m) * sum(θ[2:end] .^ 2)
end, (θ::Array, storage::Array) -> begin
h = sigmoid(X * θ)
storage[:] = (1 / m) * (X' * (h .- y)) + (λ / m) * [0; θ[2:end]]
end
end
Sigmoid function implementation:
sigmoid(z) = 1.0 ./ (1.0 + exp(-z))
To apply cost_gradient
in Optim.jl do the following:
using Optim
#...
# Prerequisites:
# X size is (m,d), where d is the number of training set features
# y size is (m,1)
# λ as the regularization parameter, e.g 1.5
# ITERATIONS number of iterations, e.g. 1000
X=[ones(size(X,1)) X] #add x_0=1.0 column; now X size is (m,d+1)
initialθ = zeros(size(X,2),1) #initialTheta size is (d+1, 1)
cost, gradient! = cost_gradient(initialθ, X, y, λ)
res = optimize(cost, gradient!, initialθ, method = ConjugateGradient(), iterations = ITERATIONS);
θ = Optim.minimizer(res);
Now, you can easily predict (e.g. training set validation):
predictions = sigmoid(X * θ) #X size is (m,d+1)
Either try my approach or compare it with your implementation.
Here is an example of unregularized logistic regression that uses the autodifferentiation functionality of Optim.jl. It might help you with your own implementation.
using Optim
const X = rand(100, 3)
const true_β = [5,2,4]
const true_y = 1 ./ (1 + exp(-X*true_β))
function objective(β)
y = 1 ./ (1 + exp(-X*β))
return sum((y - true_y).^2) # Use SSE, non-standard for log. reg.
end
println(optimize(objective, [3.0,3.0,3.0],
autodiff=true, method=LBFGS()))
Which gives me
Results of Optimization Algorithm
* Algorithm: L-BFGS
* Starting Point: [3.0,3.0,3.0]
* Minimizer: [4.999999945789497,1.9999999853962256,4.0000000047769495]
* Minimum: 0.000000
* Iterations: 14
* Convergence: true
* |x - x'| < 1.0e-32: false
* |f(x) - f(x')| / |f(x)| < 1.0e-08: false
* |g(x)| < 1.0e-08: true
* Exceeded Maximum Number of Iterations: false
* Objective Function Calls: 53
* Gradient Call: 53
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With