I have been following Andrew NG's videos on neural networks. In these videos, he doesn't associate a bias to each and every neuron. Instead, he adds a bias unit at the head of every layer after their activations have been computed and uses this bias along with the computations to calculate the activations of the next layer (forward propogation).
However, in some other blogs on machine learning and videos like this, there is a bias being associated with each neuron. What and why is this difference and what are it's implications?
Both approaches are representing the same bias concept. For each unit (excluding input nodes) you compute the value of activation function of a dot product of weights and activations from previous layers (in case of feed forward network) vectors plus scalar bias value :
(w * a) + b
In Andrew Ng this value is computed using vectorisation trick in which you concatenate your activations with specified bias constant (usually 1) and that does the job (because this constant has its own weight for different nodes - so this is exactly the same to having another bias value for each node).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With