Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to create an optimizer in Tensorflow

I want to write a new optimization algorithm for my network on Tensorflow. I hope to implement the Levenberg Marquardt optimization algorithm, which now is excluded from TF API. I found poor documentation on how to write a custom optimizer, so i ask if someone can give my any advice. Thanks.

like image 876
Alberto Manzini Avatar asked Jul 18 '16 07:07

Alberto Manzini


People also ask

How do I create a custom Optimizer in Tensorflow?

For most (custom) optimizer implementations, the method apply_gradients() needs to be adapted. This method relies on the (new) Optimizer (class), which we will create, to implement the following methods: _create_slots(), _prepare(), _apply_dense(), and _apply_sparse().

What is an optimizer in Tensorflow?

Optimizers are the extended class, which include added information to train a specific model. The optimizer class is initialized with given parameters but it is important to remember that no Tensor is needed. The optimizers are used for improving speed and performance for training a specific model.

How do I save optimizer in Tensorflow?

As of tensorflow 2.5, if you set the optimizer of a keras model with model. compile , then model. save_weights and model. load_weights seem to preserve the optimizer state with no problem.


2 Answers

The simplest example of an optimizer is probably the gradient descent optimizer. It shows how one creates an instance of the basic optimizer class. The optimizer base class documentation explains what the methods do.

The python side of the optimizers adds new nodes to the graph that compute and apply the gradients being back-propagated. It supplies the parameters that get passed to the ops and does some of the high-level management of the optimizer. Then, you need the actual "Apply" op.

Ops have both a python and a C++ component. Writing a training op is the same (but specialized) as the general process of adding an Op to TensorFlow.

For an example set of training ops that compute and apply gradients, see python/training/training_ops.py - this is the Python glue for the actual training ops. Note that the code here is mostly about shape inference - the computation is going to be in the C++.

The actual math for applying the gradients is handled by an Op (recalling that, in general, ops are written in C++). In this case, the apply gradients ops are defined in core/kernels/training_ops.cc. You can see, for example, the implementation of ApplyGradientDescentOp in there, which references a functor ApplyGradientDescent:

var.device(d) -= grad * lr(); 

The implementation of the Op itself follows the implementation of any other op as described in the adding-an-op docs.

like image 165
dga Avatar answered Sep 22 '22 06:09

dga


Before running the Tensorflow Session, one should initiate an Optimizer as seen below:

# Gradient Descent optimizer = tf.train.GradientDescentOptimizer(learning_rate).minimize(cost) 

tf.train.GradientDescentOptimizer is an object of the class GradientDescentOptimizer and as the name says, it implements the gradient descent algorithm.

The method minimize() is being called with a “cost” as parameter and consists of the two methods compute_gradients() and then apply_gradients().

For most (custom) optimizer implementations, the method apply_gradients() needs to be adapted.

This method relies on the (new) Optimizer (class), which we will create, to implement the following methods: _create_slots(), _prepare(), _apply_dense(), and _apply_sparse().

  • _create_slots() and _prepare() create and initialise additional variables, such as momentum.

  • _apply_dense(), and _apply_sparse() implement the actual Ops, which update the variables.

Ops are generally written in C++ . Without having to change the C++ header yourself, you can still return a python wrapper of some Ops through these methods. This is done as follows:

def _create_slots(self, var_list):    # Create slots for allocation and later management of additional     # variables associated with the variables to train.    # for example: the first and second moments.    '''    for v in var_list:       self._zeros_slot(v, "m", self._name)       self._zeros_slot(v, "v", self._name)    ''' def _apply_dense(self, grad, var):    #define your favourite variable update     # for example:    '''    # Here we apply gradient descents by substracting the variables     # with the gradient times the learning_rate (defined in __init__)    var_update = state_ops.assign_sub(var, self.learning_rate * grad)     '''    #The trick is now to pass the Ops in the control_flow_ops and     # eventually groups any particular computation of the slots your     # wish to keep track of:    # for example:        '''     m_t = ...m... #do something with m and grad     v_t = ...v... # do something with v and grad     '''   return control_flow_ops.group(*[var_update, m_t, v_t]) 

For a more detailed explanation with example, see this blog post https://www.bigdatarepublic.nl/custom-optimizer-in-tensorflow/

like image 42
Benoit Descamps Avatar answered Sep 22 '22 06:09

Benoit Descamps