I am looking to implement a generic neural network, with 1 input layer consisting of input nodes, 1 output layer consisting of output nodes, and N hidden layers consisting of hidden nodes. Nodes are organized into layers, with the rule that nodes in the same layer cannot be connected.
I mostly understand the concept of the bias, but I have a question.
Should there be one bias value per layer (shared by all nodes in that layer) or should each node (except nodes in the input layer) have their own bias value?
I have a feeling it could be done both ways, and would like to understand the trade-offs of each approach, and also know what implementation is most commonly used.
There's Only One Bias per Layer. More generally, we're interested to demonstrate whether the bias in a single-layer neural network is unique or not.
Each neuron except for in the input-layer has a bias.
The bias node in a neural network is a node that is always 'on'. That is, its value is set to 1 without regard for the data in a given pattern.
Bias allows you to shift the activation function by adding a constant (i.e. the given bias) to the input. Bias in Neural Networks can be thought of as analogous to the role of a constant in a linear function, whereby the line is effectively transposed by the constant value.
To answer this question properly, we should first establish exactly what we mean when we say "Bias value" as done in the question. Neural Networks are typically intuitively viewed (and explained to beginners) as a network of nodes (neurons) and weighted, directed connections between nodes. In this view, Biases are very frequently drawn as additional ''input'' nodes, which always have an activation level of exactly 1.0
. This value of 1.0
may be what some people think of when they hear "Bias Value". Such a Bias Node would have connections to other nodes, with trainable weights. Other people may think of those weights as "Bias Values". Since the question was tagged with the bias-neuron
tag, I'll answer the question under the assumption that we use the first definition, e.g. Bias Value = 1.0
for some Bias Node / neuron.
From this point of view... it absolutely does not matter at all mathematically how many Bias nodes/values we put in our network, as long as we make sure to connect them to the correct nodes. You could intuitively think of the entire network as having only a single bias node with a value of 1.0
that does not belong to any particular layer, and has connections to all nodes other than the input nodes. This may be difficult to draw though, if you want to make a drawing of your neural network it may be more convenient to place a separate bias node (each with a value of 1.0
) in every layer except for the output layer, and connect each of those bias nodes to all the nodes in the layer directly after it. Mathematically, these two interpretations are equivalent, since in both cases every non-input node has an incoming weighted connection from a node that always has an activation level of 1.0
.
When Neural Networks are programmed, there typically aren't any explicit node ''objects'' at all (at least in efficient implementations). There will generally just be matrices for the weights. From this point of view, there is no longer any choice. We'll (almost) always want one ''bias-weight'' (a weight being multiplied by a constant activation level of 1.0
) going to every non-input node, and we'll have to make sure all those weights appear in the correct spots in our weight matrices.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With