I would like to maximize a function with one parameter.
So I run gradient descent (or, ascent actually): I start with an initial parameter and keep adding the gradient (times some learning rate factor that gets smaller and smaller), re-evaluate the gradient given the new parameter, and so on until convergence.
But there is one problem: My parameter must stay positive, so it is not supposed to become <= 0 because my function will be undefined. My gradient search will sometimes go into such regions though (when it was positive, the gradient told it to go a bit lower, and it overshoots).
And to make things worse, the gradient at such a point might be negative, driving the search toward even more negative parameter values. (The reason is that the objective function contains logs, but the gradient doesn't.)
What are some good (simple) algorithms that deal with this constrained optimization problem? I'm hoping for just a simple fix to my algorithm. Or maybe ignore the gradient and do some kind of line search for the optimal parameter?
Without knowing more about your problem, it's hard to give specific advice. Your gradient ascent algorithm may not be particularly suitable for your function space. However, given that's what you've got, here's one tweak that would help.
You're following what you believe is an ascending gradient. But when you move forwards in the direction of the gradient, you discover you have fallen into a pit of negative value. This implies that there was a nearby local maximum, but also a very sharp negative gradient cliff. The obvious fix is to backtrack to your previous position, and take a smaller step (e.g half the size). If you still fall in, repeat with a still smaller step. This will iterate until you find the local maximum at the edge of the cliff.
The problem is, there is no guarantee that your local maximum is actually global (unless you know more about your function than you are sharing). This is the main limitation of naive gradient ascent - it always fixes on the first local maximum and converges to it. If you don't want to switch to a more robust algorithm, one simple approach that could help is to run n iterations of your code, starting each time from random positions in the function space, and keeping the best maximum you find overall. This Monte Carlo approach increases the odds that you will stumble on the global maximum, at the cost of increasing your run time by a factor n. How effective this is will depend on the 'bumpiness' of your objective function.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With