Their documentation is short and they both refer to the same paper. Is there a difference in what those two functions implement? If no, is one of them soon-to-be obsolete by the other, which one of the two is recommended for use?
According to the performance guide
The non-fused batch norm does computations using several individual Ops. Fused batch norm combines the individual operations into a single kernel, which runs faster.
EDIT: 1/6/2020
The original link no longer works. This is a web archive link provided by Rika. The updated text says:
Fused batch norm combines the multiple operations needed to do batch normalization into a single kernel. Batch norm is an expensive process that for some models makes up a large percentage of the operation time. Using fused batch norm can result in a 12%-30% speedup.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With