Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is a fused kernel (or fused layer) in deep learning?

I am reading the Apex AMP documentation:

A Python-only build omits:

  • Fused kernels required to use apex.optimizers.FusedAdam.
  • Fused kernels required to use apex.normalization.FusedLayerNorm.
  • Fused kernels that improve the performance and numerical stability of apex.parallel.SyncBatchNorm.
  • Fused kernels that improve the performance of apex.parallel.DistributedDataParallel and apex.amp. DistributedDataParallel, amp, and SyncBatchNorm will still be usable, but they may be slower.

There also seems to be a "FusedAdam" optimizer:

The Adam optimizer in Pytorch (like all Pytorch optimizers) carries out optimizer.step() by looping over parameters, and launching a series of kernels for each parameter. This can require hundreds of small launches that are mostly bound by CPU-side Python looping and kernel launch overhead, resulting in poor device utilization. Currently, the FusedAdam implementation in Apex flattens the parameters for the optimization step, then carries out the optimization step itself via a fused kernel that combines all the Adam operations. In this way, the loop over parameters as well as the internal series of Adam operations for each parameter are fused such that optimizer.step() requires only a few kernel launches.

The current implementation (in Apex master) is brittle and only works with Amp opt_level O2. I’ve got a WIP branch to make it work for any opt_level (https://github.com/NVIDIA/apex/pull/351). I recommend waiting until this is merged then trying it.

This partially explains it. I'm left with more questions:

What is meant by kernel? A layer or an optimizer?

Is the idea of fused layer the same as a fused optimizer?

like image 318
Benjamin Crouzier Avatar asked Jun 14 '19 15:06

Benjamin Crouzier


People also ask

What is a fused kernel?

"Fusing" means commonalization of computation steps. Basically, it's an implementation trick to run code more efficiently by combining similar operations in a single hardware (GPU, CPU or TPU) operation.

What is fused Adam?

FusedAdam (GPU)Implements Adam algorithm. Currently GPU-only. This version of fused Adam implements 2 fusions. Fusion of the Adam update's elementwise operations. A multi-tensor apply launch that batches the elementwise updates applied to all the model's parameters into one or a few kernel launches.


1 Answers

  • "Kernel" here is for computation kernels: https://en.wikipedia.org/wiki/Compute_kernel Operations like convolution are often implemented using compute kernels for better efficiency. Compute kernels can be written using C, CUDA, OpenCL or even assembly for maximum efficiency. It is therefore not surprizing that "a Python-only build" does not support...

  • "Fusing" means commonalization of computation steps. Basically, it's an implementation trick to run code more efficiently by combining similar operations in a single hardware (GPU, CPU or TPU) operation. Therefore, a "fusedLayer" is a layer where operations benefit from a "fused" implementation.

like image 171
ma3oun Avatar answered Oct 13 '22 20:10

ma3oun