Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Neural Network: Solving XOR

Could someone please give me a mathematical correct explanation why a Multilayer Perceptron can solve the XOR problem?

My interpretation of the perceptron is as follows:

A perceptron with two inputs enter image description here and enter image description here has following linear function and is hence able to solve linear separateable problems such as AND and OR.

enter image description here

enter image description here is the basic step function.

The way I think of it is that I substitute the two parts within enter image description here separated by the + sign as enter image description here and enter image description here and I get enter image description here which is a line. By applying the step function I get one of the the clusters in respect to the input. Which I interpret as one of the spaces separated by that line.

Because the function of an MLP is still linear, how do I interpret this in a mathematical way and more important: Why is it able to solve the XOR problem when it's still linear? Is it because its interpolating a polynomial?

like image 416
Bastian Avatar asked Jun 09 '16 19:06

Bastian


People also ask

Can neural network solve XOR?

The XOR problem with neural networks can be solved by using Multi-Layer Perceptrons or a neural network architecture with an input layer, hidden layer, and output layer. So during the forward propagation through the neural networks, the weights get updated to the corresponding layers and the XOR logic gets executed.

What is XOR problem in neural networks?

The XOr problem is that we need to build a Neural Network (a perceptron in our case) to produce the truth table related to the XOr logical operator. This is a binary classification problem. Hence, supervised learning is a better way to solve it. In this case, we will be using perceptrons.

Can perceptron solve XOR?

Linearly separable data basically means that you can separate data with a point in 1D, a line in 2D, a plane in 3D and so on. A perceptron can only converge on linearly separable data. Therefore, it isn't capable of imitating the XOR function.

Who Solved XOR problem?

On the surface, XOR appears to be a very simple problem, however, Minksy and Papert (1969) showed that this was a big problem for neural network architectures of the 1960s, known as perceptrons.


2 Answers

You are looking for a mathematical explanation, so let's first take a look on how a perceptron works:

Simple perceptron with two-dim input

The input gets weighted and summed up. If it exceeds a threshold theta, 1 is returned, otherwise 0. In the XOR case x1 and x2 can be either 1 or 0 and you are searching for weights w1 and w2 as well as a threshold theta such that in case of x1 XOR x2:

w1*x1 + w2*x2 >= theta

OR

w1*x1 + w2*x2 - theta >= 0

First, you can see that the function is linear. This means that it defines a line. But when you look at the sample space, there is no line that can separate the positive from the negative cases.

Second, you can try it out. Take an arbitrary theta, let's say 0.5.

Case 1: x1 = 1, x2 = 0 => w1 needs to be > 0.5

Case 2: x1 = 0, x2 = 1 => w2 needs to be > 0.5

Case 3: x1 = 1, x2 = 1 => w1+w2 needs to be < 0.5 => impossible due to previous two cases

In general, with a perceptron you can only define functions that are linear separable, i.e. lines, planes, hyperplanes etc.

But for the XOR case you need two lines:

enter image description here

For each line, you need one hidden node and then combine things together while taking the negation into account.

You can see a solution here:

How to solve XOR problem with MLP neural network?

So the trick is not to get non-linear but rewrite XOR into something like:

x1 XOR x2 == NOT (x1 AND x2) AND (x1 OR x2)

like image 51
Thomas Kutz Avatar answered Oct 10 '22 02:10

Thomas Kutz


Try plotting the sample space of an XOR function of two variables x1 and x2. The decision boundary seperating the positive(y=1) and negative examples(y=0) is clearly not a straight line but a non-linear decision boundary as follows:

enter image description here

Since, modelling a non-linear decision boundary cannot be done by a simple neural network consisting of only input and output layers. Hence, a hidden layer is required to model the non-linear decision boundary required. On the other hand, functions like AND, OR, NOT have linear decision boundary and hence can be modelled by simple input-output neural nets.

like image 35
Nagabhushan Baddi Avatar answered Oct 10 '22 00:10

Nagabhushan Baddi