I have seen straight through estimator (STE) in many Neural Network related papers e.g. this and this. But I cannot understand the concept. I wonder if anyone could explain STE or refer me to a simple resource?
A straight through estimator is a way of estimating gradients for a threshold operation in a neural network. The threshold could be as simple as the following function,
As we can see, the derivative of this threshold function will 0 and during back-propagation, the network will not learn anything since it gets 0 gradients and the weights won't get updated.
The concept of a straight through estimator is that you set the incoming gradients to a threshold function equal to it's outgoing gradients, disregarding the derivative of the threshold function itself. This has been shown to perform well in the results (Figure 2) in this paper you have referenced.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With