Could someone please give me a mathematical correct explanation why a Multilayer Perceptron can solve the XOR problem? My interpretation of the perceptron is as follows: A perceptron with two inputs <img src="https://i.stack.imgur.com/Zp7Xz.png" alt="enter image description here"> and <img src="https://i.stack.imgur.com/aVjml.png" alt="enter image description here"> has following linear function and is hence able to solve linear separateable problems such as AND and OR. <img src="https://i.stack.imgur.com/HmPo6.png" alt="enter image description here"> <img src="https://i.stack.imgur.com/RKmbt.png" alt="enter image description here"> is the basic step function. The way I think of it is that I substitute the two parts within <img src="https://i.stack.imgur.com/MdGfw.png" alt="enter image description here"> separated by the + sign as <img src="https://i.stack.imgur.com/C9azl.png" alt="enter image description here"> and <img src="https://i.stack.imgur.com/rFKig.png" alt="enter image description here"> and I get <img src="https://i.stack.imgur.com/Hgvh8.png" alt="enter image description here"> which is a line. By applying the step function I get one of the the clusters in respect to the input. Which I interpret as one of the spaces separated by that line. Because the function of an MLP is still linear, how do I interpret this in a mathematical way and more important: Why is it able to solve the XOR problem when it's still linear? Is it because its interpolating a polynomial?

You are looking for a mathematical explanation, so let's first take a look on how a perceptron works: <img src="https://i.stack.imgur.com/8mNKV.png" alt="Simple perceptron with two-dim input"> The input gets weighted and summed up. If it exceeds a threshold theta, 1 is returned, otherwise 0. In the XOR case x1 and x2 can be either 1 or 0 and you are searching for weights w1 and w2 as well as a threshold theta such that in case of x1 XOR x2: w1*x1 + w2*x2 >= theta OR w1*x1 + w2*x2 - theta >= 0 First, you can see that the function is linear. This means that it defines a line. But when you look at the sample space, there is no line that can separate the positive from the negative cases. Second, you can try it out. Take an arbitrary theta, let's say 0.5. Case 1: x1 = 1, x2 = 0 => w1 needs to be > 0.5 Case 2: x1 = 0, x2 = 1 => w2 needs to be > 0.5 Case 3: x1 = 1, x2 = 1 => w1+w2 needs to be < 0.5 => impossible due to previous two cases In general, with a perceptron you can only define functions that are linear separable, i.e. lines, planes, hyperplanes etc. But for the XOR case you need two lines: <img src="https://i.stack.imgur.com/lX6t7.png" alt="enter image description here"> For each line, you need one hidden node and then combine things together while taking the negation into account. You can see a solution here: How to solve XOR problem with MLP neural network? So the trick is not to get non-linear but rewrite XOR into something like: x1 XOR x2 == NOT (x1 AND x2) AND (x1 OR x2)

Try plotting the sample space of an XOR function of two variables x1 and x2. The decision boundary seperating the positive(y=1) and negative examples(y=0) is clearly not a straight line but a non-linear decision boundary as follows: <img src="https://i.stack.imgur.com/gs8W1.png" alt="enter image description here"> Since, modelling a non-linear decision boundary cannot be done by a simple neural network consisting of only input and output layers. Hence, a hidden layer is required to model the non-linear decision boundary required. On the other hand, functions like AND, OR, NOT have linear decision boundary and hence can be modelled by simple input-output neural nets.

Neural Network: Solving XOR

Tags:

artificial-intelligence

machine-learning

neural-network

perceptron

Could someone please give me a mathematical correct explanation why a Multilayer Perceptron can solve the XOR problem?

My interpretation of the perceptron is as follows:

A perceptron with two inputs enter image description here and has following linear function and is hence able to solve linear separateable problems such as AND and OR.

enter image description here

enter image description here is the basic step function.

The way I think of it is that I substitute the two parts within enter image description here separated by the + sign as and and I get which is a line. By applying the step function I get one of the the clusters in respect to the input. Which I interpret as one of the spaces separated by that line.

Because the function of an MLP is still linear, how do I interpret this in a mathematical way and more important: Why is it able to solve the XOR problem when it's still linear? Is it because its interpolating a polynomial?

416

asked Jun 09 '16 19:06

Bastian

2 Answers

You are looking for a mathematical explanation, so let's first take a look on how a perceptron works:

Simple perceptron with two-dim input

The input gets weighted and summed up. If it exceeds a threshold theta, 1 is returned, otherwise 0. In the XOR case x1 and x2 can be either 1 or 0 and you are searching for weights w1 and w2 as well as a threshold theta such that in case of x1 XOR x2:

w1*x1 + w2*x2 >= theta

w1*x1 + w2*x2 - theta >= 0

First, you can see that the function is linear. This means that it defines a line. But when you look at the sample space, there is no line that can separate the positive from the negative cases.

Second, you can try it out. Take an arbitrary theta, let's say 0.5.

Case 1: x1 = 1, x2 = 0 => w1 needs to be > 0.5

Case 2: x1 = 0, x2 = 1 => w2 needs to be > 0.5

Case 3: x1 = 1, x2 = 1 => w1+w2 needs to be < 0.5 => impossible due to previous two cases

In general, with a perceptron you can only define functions that are linear separable, i.e. lines, planes, hyperplanes etc.

But for the XOR case you need two lines:

enter image description here

For each line, you need one hidden node and then combine things together while taking the negation into account.

You can see a solution here:

How to solve XOR problem with MLP neural network?

So the trick is not to get non-linear but rewrite XOR into something like:

x1 XOR x2 == NOT (x1 AND x2) AND (x1 OR x2)

answered Oct 10 '22 02:10

Thomas Kutz

Try plotting the sample space of an XOR function of two variables x₁ and x₂. The decision boundary seperating the positive(y=1) and negative examples(y=0) is clearly not a straight line but a non-linear decision boundary as follows:

enter image description here

Since, modelling a non-linear decision boundary cannot be done by a simple neural network consisting of only input and output layers. Hence, a hidden layer is required to model the non-linear decision boundary required. On the other hand, functions like AND, OR, NOT have linear decision boundary and hence can be modelled by simple input-output neural nets.

answered Oct 10 '22 00:10

Nagabhushan Baddi

Related questions
                            
                                Difference between Shuffle and Random_State in train test split?
                            
                                ERROR: Cannot uninstall 'ruamel-yaml' while creating docker image for azure ML ACI deployment
                            
                                No module named 'tensorflow.keras.layers.experimental.preprocessing'
                            
                                Handling Incomplete Data (Data Sparsity) in kNN
                            
                                wrong model type for regression error in 10 fold cross validation for Naive Bayes using R
                            
                                Increase or decrease learning rate for adding neurons or weights?
                            
                                Zero initialiser for biases using get_variable in tensorflow
                            
                                Tensorflow Error : No Variables to optimize
                            
                                ValueError: multiclass format is not supported , xgboost
                            
                                Python NLP Intent Identification
                            
                                How do I turn a Pytorch Dataloader into a numpy array to display image data with matplotlib?
                            
                                Stop Keras Training when the network has fully converge
                            
                                LSTM autoencoder always returns the average of the input sequence
                            
                                Pytorch: How can I find indices of first nonzero element in each row of a 2D tensor?
                            
                                What are the differences between contextual embedding and word embedding
                            
                                Gradient boosting predictions in low-latency production environments?
                            
                                Methods for automated synonym detection
                            
                                Maximal vs. Closed Patterns in Association Rule Mining
                            
                                How to convert 2d numpy array into binary indicator matrix for max value
                            
                                What is the difference between Deep Learning and traditional Artificial Neural Network machine learning? [closed]

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Neural Network: Solving XOR

Tags:

artificial-intelligence

machine-learning

neural-network

perceptron

Bastian

People also ask

2 Answers

Thomas Kutz

Nagabhushan Baddi

Recent Activity

Donate For Us