Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is it possible to integrate Levenberg-Marquardt optimizer from Tensorflow Graphics with a Tensorflow 2.0 model?

I have a Tensorflow 2.0 tf.keras.Sequential model. Now, my technical specification prescribes using the Levenberg-Marquardt optimizer to fit the model. Tensorflow 2.0 doesn't provide it as an optimizer out of the box, but it is available in the Tensorflow Graphics module.

tfg.math.optimizer.levenberg_marquardt.minimize function accepts residuals ( a residual is a Python callable returning a tensor) and variables (list of tensors corresponding to my model weights) as parameters.

What would be the best way to convert my model into residuals and variables?

If I understand correctly how the minimize function works, I have to provide two residuals. The first residual must call my model for every learning case and aggregate all the results into a tensor. The second residuals must return all labels as a single constant tensor. The problem is that tf.keras.Sequential.predict function returns a numpy array instead of tensor. I believe that if I convert it to a tensor, the minimizer won't be able to calculate jacobians with respect to variables.

The same problem is with variables. It doesn't seem like there's a way to extract all weights from a model into a list of tensors.

like image 827
Volodymyr Frolov Avatar asked Oct 25 '19 19:10

Volodymyr Frolov


1 Answers

There's a major difference between tfg.math.optimizer.levenberg_marquardt.minimize and Keras optimizers from the implementation/API perspective.

Keras optimizers, such as tf.keras.optimizers.Adam consume gradients as input and updates tf.Variables.

In contrast, tfg.math.optimizer.levenberg_marquardt.minimize essentially unrolls the optimization loop in graph mode (using a tf.while_loop construct). It takes initial parameter values and produces updated parameter values, unlike Adam & co, which only apply one iteration and actually change the values of tf.Variables via assign_add.

Stepping back a bit to the theoretical big picture, Levenberg-Marquardt is not a general gradient descent-like solver for any nonlinear optimization problem (such as Adam is). It specifically addresses nonlinear least-squares optimization, so it's not a drop-in replacement for optimizers like Adam. In gradient descent, we compute the gradient of the loss with respect to the parameters. In Levenberg-Marquardt, we compute the Jacobian of the residuals with respect to the parameters. Concretely, it repeatedly solves the linearized problem Jacobian @ delta_params = residuals for delta_params using tf.linalg.lstsq (which internally uses Cholesky decomposition on the Gram matrix computed from the Jacobian) and applies delta_params as the update.

Note that this lstsq operation has cubic complexity in the number of parameters, so in case of neural nets it can only be applied for fairly small ones.

Also note that Levenberg-Marquardt is usually applied as a batch algorithm, not a minibatch algorithm like SGD, though there's nothing stopping you from applying the LM iteration on different minibatches in each iteration.

I think you may only be able to get one iteration out of tfg's LM algorithm, through something like

from tensorflow_graphics.math.optimizer.levenberg_marquardt import minimize as lm_minimize

for input_batch, target_batch in dataset:

    def residual_fn(trainable_params):
        # do not use trainable params, it will still be at its initial value, since we only do one iteration of Levenberg Marquardt each time.
        return model(input_batch) - target_batch

    new_objective_value, new_params = lm_minimize(residual_fn, model.trainable_variables, max_iter=1)
    for var, new_param in zip(model.trainable_variables, new_params):
        var.assign(new_param)

In contrast, I believe the following naive method will not work where we assign model parameters before computing the residuals:

from tensorflow_graphics.math.optimizer.levenberg_marquardt import minimize as lm_minimize

dataset_iterator = ...

def residual_fn(params):
    input_batch, target_batch = next(dataset_iterator)
    for var, param in zip(model.trainable_variables, params):
        var.assign(param)
    return model(input_batch) - target_batch

final_objective, final_params = lm_minimize(residual_fn, model.trainable_variables, max_iter=10000)
for var, final_param in zip(model.trainable_variables, final_params):
    var.assign(final_param)

The main conceptual problem is that residual_fn's output has no gradients wrt its input params, since this dependency goes through a tf.assign. But it might even fail before that due to using constructs that are disallowed in graph mode.

Overall I believe it's best to write your own LM optimizer that works on tf.Variables, since tfg.math.optimizer.levenberg_marquardt.minimize has a very different API that is not really suited for optimizing Keras model parameters since you can't directly compute model(input, parameters) - target_value without a tf.assign.

like image 85
isarandi Avatar answered Nov 13 '22 05:11

isarandi