I'm aware of the gradient descent and the back-propagation algorithm. What I don't get is: when is using a bias important and how do you use it? For example, when mapping the <code>AND</code> function, when I use two inputs and one output, it does not give the correct weights. However, when I use three inputs (one of which is a bias), it gives the correct weights.

I think that biases are almost always helpful. In effect, a bias value allows you to shift the activation function to the left or right, which may be critical for successful learning. It might help to look at a simple example. Consider this 1-input, 1-output network that has no bias: <img src="https://i.stack.imgur.com/bI2Tm.gif" alt="simple network"> The output of the network is computed by multiplying the input (x) by the weight (w0) and passing the result through some kind of activation function (e.g. a sigmoid function.) Here is the function that this network computes, for various values of w0: <img src="https://i.stack.imgur.com/ddyfr.png" alt="network output, given different w0 weights"> Changing the weight w0 essentially changes the "steepness" of the sigmoid. That's useful, but what if you wanted the network to output 0 when x is 2? Just changing the steepness of the sigmoid won't really work -- you want to be able to shift the entire curve to the right. That's exactly what the bias allows you to do. If we add a bias to that network, like so: <img src="https://i.stack.imgur.com/oapHD.gif" alt="simple network with a bias"> ...then the output of the network becomes sig(w0*x + w1*1.0). Here is what the output of the network looks like for various values of w1: <img src="https://i.stack.imgur.com/t2mC3.png" alt="network output, given different w1 weights"> Having a weight of -5 for w1 shifts the curve to the right, which allows us to have a network that outputs 0 when x is 2.

A simpler way to understand what the bias is: it is somehow similar to the constant b of a linear function y = ax + b It allows you to move the line up and down to fit the prediction with the data better. Without b, the line always goes through the origin (0, 0) and you may get a poorer fit.

What is the role of the bias in neural networks? [closed]

Tags:

artificial-intelligence

machine-learning

neural-network

backpropagation

I'm aware of the gradient descent and the back-propagation algorithm. What I don't get is: when is using a bias important and how do you use it?

For example, when mapping the AND function, when I use two inputs and one output, it does not give the correct weights. However, when I use three inputs (one of which is a bias), it gives the correct weights.

595

asked Mar 19 '10 21:03

Karan

2 Answers

I think that biases are almost always helpful. In effect, a bias value allows you to shift the activation function to the left or right, which may be critical for successful learning.

It might help to look at a simple example. Consider this 1-input, 1-output network that has no bias:

simple network

The output of the network is computed by multiplying the input (x) by the weight (w₀) and passing the result through some kind of activation function (e.g. a sigmoid function.)

Here is the function that this network computes, for various values of w₀:

network output, given different w0 weights

Changing the weight w₀ essentially changes the "steepness" of the sigmoid. That's useful, but what if you wanted the network to output 0 when x is 2? Just changing the steepness of the sigmoid won't really work -- you want to be able to shift the entire curve to the right.

That's exactly what the bias allows you to do. If we add a bias to that network, like so:

simple network with a bias

...then the output of the network becomes sig(w₀*x + w₁*1.0). Here is what the output of the network looks like for various values of w₁:

network output, given different w1 weights

Having a weight of -5 for w₁ shifts the curve to the right, which allows us to have a network that outputs 0 when x is 2.

answered Oct 12 '22 11:10

Nate Kohl

A simpler way to understand what the bias is: it is somehow similar to the constant b of a linear function

y = ax + b

It allows you to move the line up and down to fit the prediction with the data better.

Without b, the line always goes through the origin (0, 0) and you may get a poorer fit.

answered Oct 12 '22 12:10

zfy

Related questions
                            
                                How can I one hot encode in Python?
                            
                                Why binary_crossentropy and categorical_crossentropy give different performances for the same problem?
                            
                                Is it possible to specify your own distance function using scikit-learn K-Means Clustering?
                            
                                How to split data into 3 sets (train, validation and test)?
                            
                                Difference between classification and clustering in data mining? [closed]
                            
                                Which machine learning classifier to choose, in general? [closed]
                            
                                Save classifier to disk in scikit-learn
                            
                                Is there a rule-of-thumb for how to divide a dataset into training and validation sets? [closed]
                            
                                How to interpret loss and accuracy for a machine learning model [closed]
                            
                                What is the difference between linear regression and logistic regression? [closed]
                            
                                How to implement the Softmax function in Python
                            
                                What is the difference between supervised learning and unsupervised learning? [closed]
                            
                                Convert array of indices to 1-hot encoded numpy array
                            
                                What is the meaning of the word logits in TensorFlow? [duplicate]
                            
                                What are advantages of Artificial Neural Networks over Support Vector Machines? [closed]
                            
                                What are logits? What is the difference between softmax and softmax_cross_entropy_with_logits?
                            
                                How does the Google "Did you mean?" Algorithm work? [closed]
                            
                                Epoch vs Iteration when training neural networks [closed]
                            
                                A simple explanation of Naive Bayes Classification [closed]
                            
                                What is the difference between a generative and a discriminative algorithm? [closed]

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With