I see it in __init__ of e.g. Adam optimizer: self._set_hyper('beta_1', beta_1). There are also _get_hyper and _serialize_hyperparameter throughout the code. I don't see these in Keras optimizers - are they optional? When should or shouldn't they be used when creating custom optimizers?
They enable setting and getting Python literals (int, str, etc), callables, and tensors. Usage is for convenience and consistency: anything set via _set_hyper can be retrieved via _get_hyper, avoiding repeating boilerplate code. I've implemented Keras AdamW in all major TF & Keras versions, and will use it as reference.
t_cur is a tf.Variable. Each time we "set" it, we must invoke K.set_value; if we do self.t_cur=5, this will destroy tf.Variable and wreck optimizer functionality. If instead we used model.optimizer._set_hyper('t_cur', 5), it'd set it appropriately - but this requires for it to have been defined via set_hyper previously.Both _get_hyper & _set_hyper enable programmatic treatment of attributes - e.g., we can make a for-loop with a list of attribute names to get or set using just _get_hyper and _set_hyper, whereas otherwise we'd need to code conditionals and typechecks. Also, _get_hyper(name) requires that name was previously set via set_hyper.
_get_hyper enables typecasting via dtype=. Ex: beta_1_t in default Adam is cast to same numeric type as var (e.g. layer weight), which is required for some ops. Again a convenience, as we could typecast manually (math_ops.cast).
_set_hyper enables the use of _serialize_hyperparameter, which retrieves the Python values (int, float, etc) of callables, tensors, or already-Python values. Name stems from the need to convert tensors and callables to Pythonics for e.g. pickling or json-serializing - but can be used as convenience for seeing tensor values in Graph execution.
Lastly; everything instantiated via _set_hyper gets assigned to optimizer._hyper dictionary, which is then iterated over in _create_hypers. The else in the loop casts all Python numerics to tensors - so _set_hyper will not create int, float, etc attributes. Worth noting is the aggregation= kwarg, whose documentation reads: "Indicates how a distributed variable will be aggregated". This is the part a bit more than "for convenience" (lots of code to replicate).
_set_hyper has a limitation: does not allow instantiating dtype. If add_weight approach in _create_hypers is desired with dtype, then it should be called directly.When to use vs. not use: use if the attribute is used by the optimizer via TensorFlow ops - i.e. if it needs to be a tf.Variable. For example, epsilon is set regularly, as it's never needed as a tensor variable.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With