I'm trying to do Logistic Regression from Coursera in Julia, but it doesn't work.
The Julia code to calculate the Gradient:
sigmoid(z) = 1 / (1 + e ^ -z)
hypotesis(theta, x) = sigmoid(scalar(theta' * x))
function gradient(theta, x, y)
(m, n) = size(x)
h = [hypotesis(theta, x[i,:]') for i in 1:m]
g = Array(Float64, n, 1)
for j in 1:n
g[j] = sum([(h[i] - y[i]) * x[i, j] for i in 1:m])
end
g
end
If this gradient used it produces the wrong results. Can't figure out why, the code seems like the right one.
The full Julia script. In this script the optimal Theta calculated using my Gradient Descent implementation and using the built-in Optim package, and the results are different.
The gradient is correct (up to a scalar multiple, as @roygvib points out). The problem is with the gradient descent.
If you look at the values of the cost function during your gradient descent, you will see a lot of NaN
,
which probably come from the exponential:
lowering the step size (e.g., to 1e-5
) will avoid the overflow,
but you will have to increase the number of iterations a lot (perhaps to 10_000_000
).
A better (faster) solution would be to let the step size vary.
For instance, one could multiply the step size by 1.1
if the cost function improves after a step
(the optimum still looks far away in this direction: we can go faster),
and divide it by 2
if it does not (we went too fast and ended up past the minimum).
One could also do a line search in the direction of the gradient to find the best step size (but this is time-consuming and can be replaced by approximations, e.g., Armijo's rule).
Rescaling the predictive variables also helps.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With