Suppose I want to write a custom optimizer class that conforms to the <code>tf.keras</code> API (using TensorFlow version>=2.0). I am confused about the documented way to do this versus what's done in implementations. The documentation for <code>tf.keras.optimizers.Optimizer</code> states, <pre class="prettyprint lang-none prettyprint-override"><code> ### Write a customized optimizer. If you intend to create your own optimization algorithm, simply inherit from this class and override the following methods: - resource_apply_dense (update variable given gradient tensor is dense) - resource_apply_sparse (update variable given gradient tensor is sparse) - create_slots (if your optimizer algorithm requires additional variables) </code></pre> However, the current <code>tf.keras.optimizers.Optimizer</code> implementation does not define a <code>resource_apply_dense</code> method, but it does define a private-looking <code>_resource_apply_dense</code> method stub. Similarly, there are no <code>resource_apply_sparse</code> or <code>create_slots</code> methods, but there are a <code>_resource_apply_sparse</code> method stub and a <code>_create_slots</code> method call. In official <code>tf.keras.optimizers.Optimizer</code> subclasses (using <code>tf.keras.optimizers.Adam</code> as an example), there are <code>_resource_apply_dense</code>, <code>_resource_apply_sparse</code>, and <code>_create_slots</code> methods, and there are no such methods without the leading underscore. There are similar leading-underscore methods in slightly-less-official <code>tf.keras.optimizers.Optimizer</code> subclasses (e.g., <code>tfa.optimizers.MovingAverage</code> from TensorFlow Addons: <code>_resource_apply_dense</code>, <code>_resource_apply_sparse</code>, <code>_create_slots</code>). Another confounding point for me is that some of the TensorFlow Addons optimizers also override the <code>apply_gradients</code> method (e.g., <code>tfa.optimizers.MovingAverage</code>), whereas the <code>tf.keras.optimizers</code> optimizers do not. Moreover, I noticed that the <code>apply_gradients</code> method of <code>tf.keras.optimizers.Optimizer</code> method calls <code>_create_slots</code>, but the base <code>tf.keras.optimizers.Optimizer</code> class does not have a <code>_create_slots</code> method. So, it seems that a <code>_create_slots</code> method must be defined in an optimizer subclass if that subclass does not override <code>apply_gradients</code>. <hr> <h3>Questions</h3> What is the correct way to subclass a <code>tf.keras.optimizers.Optimizer</code>? Specifically, <ol> <li>Does the <code>tf.keras.optimizers.Optimizer</code> documentation listed at the top simply mean to override the leading-underscore versions of the methods they mention (e.g., <code>_resource_apply_dense</code> instead of <code>resource_apply_dense</code>)? If so, are there any API guarantees about these private-looking methods not changing their behavior in future versions of TensorFlow? What are the signatures of these methods?</li> <li>When would one override <code>apply_gradients</code> in addition to the <code>_apply_resource_[dense|sparse]</code> methods?</li> </ol> <hr> Edit. Opened issue on GitHub: #36449

Update: TF2.2 forced me to clean up all implementations - so now they can be used as a reference for TF best practices. Also added a section below on <code>_get_hyper</code> vs. <code>_set_hyper</code>. <hr> I've implemented Keras AdamW in all major TF & Keras versions - I invite you to examine optimizers_v2.py. Several points: <ul> <li>You should inherit <code>OptimizerV2</code>, which is actually what you linked; it's the latest and current base class for <code>tf.keras</code> optimizers</li> <li>You are correct in (1) - this is a documentation mistake; the methods are private, as they aren't meant to be used by the user directly.</li> <li> <code>apply_gradients</code> (or any other method) is only overidden if the default doesn't accomplish what's needed for a given optimizer; in your linked example, it's just a one-liner addon to the original</li> <li> "So, it seems that a <code>_create_slots</code> method must be defined in an optimizer subclass if that subclass does not override <code>apply_gradients</code>" - the two are unrelated; it's coincidental.</li> </ul> <hr> <ul> <li>What is the difference between <code>_resource_apply_dense</code> and <code>_resource_apply_sparse</code>?</li> </ul> Latter deals with sparse layers - e.g. <code>Embedding</code> - and former with everything else; example. <ul> <li>When should I use <code>_create_slots()</code>?</li> </ul> When defining trainable <code>tf.Variable</code>s; example: weights' first and second order moments (e.g. Adam). It uses <code>add_slot()</code>. <hr> <code>_get_hyper</code> vs. <code>_set_hyper</code>: they enable setting and getting Python literals (<code>int</code>, <code>str</code>, etc), callables, and tensors. They exist largely for convenience: anything set via <code>_set_hyper</code> can be retrieved via <code>_get_hyper</code>, avoiding repeating boilerplate code. I dedicated a Q&A to it here.

Custom TensorFlow Keras optimizer

Tags:

python

tensorflow

deep-learning

tensorflow2.x

tf.keras

Suppose I want to write a custom optimizer class that conforms to the tf.keras API (using TensorFlow version>=2.0). I am confused about the documented way to do this versus what's done in implementations.

The documentation for tf.keras.optimizers.Optimizer states,

  ### Write a customized optimizer.
  If you intend to create your own optimization algorithm, simply inherit from
  this class and override the following methods:

    - resource_apply_dense (update variable given gradient tensor is dense)
    - resource_apply_sparse (update variable given gradient tensor is sparse)
    - create_slots (if your optimizer algorithm requires additional variables)

However, the current tf.keras.optimizers.Optimizer implementation does not define a resource_apply_dense method, but it does define a private-looking _resource_apply_dense method stub. Similarly, there are no resource_apply_sparse or create_slots methods, but there are a _resource_apply_sparse method stub and a _create_slots method call.

In official tf.keras.optimizers.Optimizer subclasses (using tf.keras.optimizers.Adam as an example), there are _resource_apply_dense, _resource_apply_sparse, and _create_slots methods, and there are no such methods without the leading underscore.

There are similar leading-underscore methods in slightly-less-official tf.keras.optimizers.Optimizer subclasses (e.g., tfa.optimizers.MovingAverage from TensorFlow Addons: _resource_apply_dense, _resource_apply_sparse, _create_slots).

Another confounding point for me is that some of the TensorFlow Addons optimizers also override the apply_gradients method (e.g., tfa.optimizers.MovingAverage), whereas the tf.keras.optimizers optimizers do not.

Moreover, I noticed that the apply_gradients method of tf.keras.optimizers.Optimizer method calls _create_slots, but the base tf.keras.optimizers.Optimizer class does not have a _create_slots method. So, it seems that a _create_slots method must be defined in an optimizer subclass if that subclass does not override apply_gradients.

Questions

What is the correct way to subclass a tf.keras.optimizers.Optimizer? Specifically,

Does the tf.keras.optimizers.Optimizer documentation listed at the top simply mean to override the leading-underscore versions of the methods they mention (e.g., _resource_apply_dense instead of resource_apply_dense)? If so, are there any API guarantees about these private-looking methods not changing their behavior in future versions of TensorFlow? What are the signatures of these methods?
When would one override apply_gradients in addition to the _apply_resource_[dense|sparse] methods?

Edit. Opened issue on GitHub: #36449

430

asked Nov 08 '19 19:11

Artem Mavrin

2 Answers

Update: TF2.2 forced me to clean up all implementations - so now they can be used as a reference for TF best practices. Also added a section below on _get_hyper vs. _set_hyper.

I've implemented Keras AdamW in all major TF & Keras versions - I invite you to examine optimizers_v2.py. Several points:

You should inherit OptimizerV2, which is actually what you linked; it's the latest and current base class for tf.keras optimizers
You are correct in (1) - this is a documentation mistake; the methods are private, as they aren't meant to be used by the user directly.
apply_gradients (or any other method) is only overidden if the default doesn't accomplish what's needed for a given optimizer; in your linked example, it's just a one-liner addon to the original
"So, it seems that a _create_slots method must be defined in an optimizer subclass if that subclass does not override apply_gradients" - the two are unrelated; it's coincidental.

What is the difference between _resource_apply_dense and _resource_apply_sparse?

Latter deals with sparse layers - e.g. Embedding - and former with everything else; example.

When should I use _create_slots()?

When defining trainable tf.Variables; example: weights' first and second order moments (e.g. Adam). It uses add_slot().

_get_hyper vs. _set_hyper: they enable setting and getting Python literals (int, str, etc), callables, and tensors. They exist largely for convenience: anything set via _set_hyper can be retrieved via _get_hyper, avoiding repeating boilerplate code. I dedicated a Q&A to it here.

answered Oct 19 '22 15:10

OverLordGoldDragon

Yes, this looks to be a documentation error. The preceding underscore names are the correct methods to override. Related is the non-Keras Optimizer which has these all defined, but not implemented in the base class https://github.com/tensorflow/tensorflow/blob/master/tensorflow/python/training/optimizer.py

  def _create_slots(self, var_list):
    """Create all slots needed by the variables.
    Args:
      var_list: A list of `Variable` objects.
    """
    # No slots needed by default
    pass

  def _resource_apply_dense(self, grad, handle):
    """Add ops to apply dense gradients to the variable `handle`.
    Args:
      grad: a `Tensor` representing the gradient.
      handle: a `Tensor` of dtype `resource` which points to the variable
       to be updated.
    Returns:
      An `Operation` which updates the value of the variable.
    """
    raise NotImplementedError()

  def _resource_apply_sparse(self, grad, handle, indices):
    """Add ops to apply sparse gradients to the variable `handle`.
    Similar to `_apply_sparse`, the `indices` argument to this method has been
    de-duplicated. Optimizers which deal correctly with non-unique indices may
    instead override `_resource_apply_sparse_duplicate_indices` to avoid this
    overhead.
    Args:
      grad: a `Tensor` representing the gradient for the affected indices.
      handle: a `Tensor` of dtype `resource` which points to the variable
       to be updated.
      indices: a `Tensor` of integral type representing the indices for
       which the gradient is nonzero. Indices are unique.
    Returns:
      An `Operation` which updates the value of the variable.
    """
    raise NotImplementedError()

I don't know about apply_dense. For one thing, if you do override it, the code mentions that a per-replica DistributionStrategy could be "dangerous"

    # TODO(isaprykin): When using a DistributionStrategy, and when an
    # optimizer is created in each replica, it might be dangerous to
    # rely on some Optimizer methods.  When such methods are called on a
    # per-replica optimizer, an exception needs to be thrown.  We do
    # allow creation per-replica optimizers however, because the
    # compute_gradients()->apply_gradients() sequence is safe.

answered Oct 19 '22 16:10

Tyler

Related questions
                            
                                Find tuple structure containing an unknown value inside a list
                            
                                Scrapy throws ImportError: cannot import name xmlrpc_client
                            
                                Generating all dates within a given range in python
                            
                                Pandas: print column name with missing values
                            
                                Remove namespace and prefix from xml in python using lxml
                            
                                Remove "add another" in Django admin screen
                            
                                Ensure a single instance of an application in Linux
                            
                                How can I simplify this conversion from underscore to camelcase in Python?
                            
                                replace special characters in a string python
                            
                                Convert a number to a list of integers
                            
                                Generate random colors (RGB)
                            
                                Best way to get query string from a URL in python?
                            
                                Debugging a Python Extension in Eclipse
                            
                                How to write a Twisted client plugin
                            
                                Using setup.py to install python project as a systemd service
                            
                                Accepting output of the socket generated by Python in MQL5
                            
                                pypy import clr fails on Windows
                            
                                PySpark serialization EOFError
                            
                                Compile PyPy to Exe
                            
                                How to import your package/modules from a script in bin folder in python

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With