tflearn / tensorflow does not learn xor

Tags:

Following code was written to learn the XOR function, but about half of the time the network does not learn and the loss after each epoch stays the same.

train_f = [[0, 0], [0, 1], [1, 0], [1, 1]]
train_c = [[0], [1], [1], [0]]
test_f = train_f
test_c = train_c

import tensorflow as tf
import tflearn

X = [[0., 0.], [0., 1.], [1., 0.], [1., 1.]]
Y_xor = [[0.], [1.], [1.], [0.]]

# Graph definition
with tf.Graph().as_default():
    # Building a network with 2 optimizers
    net = tflearn.input_data(shape=[None, 2])
    # Nand operator definition
    net = tflearn.fully_connected(net, 2, activation='relu')
    net = tflearn.fully_connected(net, 2, activation='relu')
    net = tflearn.fully_connected(net, 1, activation='sigmoid')
    regressor = tflearn.regression(net, optimizer='adam', learning_rate=0.005, loss="mean_square",)

    # Training
    m = tflearn.DNN(regressor)
    m.fit(X, Y_xor, n_epoch=256, snapshot_epoch=False)

    # Testing
    print("Testing XOR operator")
    print("0 xor 0:", m.predict([[0., 0.]]))
    print("0 xor 1:", m.predict([[0., 1.]]))
    print("1 xor 0:", m.predict([[1., 0.]]))
    print("1 xor 1:", m.predict([[1., 1.]]))

Sometimes I get correct results like this:

Testing XOR operator
0 xor 0: [[0.1487255096435547]]
0 xor 1: [[0.9297153949737549]]
1 xor 0: [[0.9354135394096375]]
1 xor 1: [[0.1487255096435547]]

But often this:

Testing XOR operator
0 xor 0: [[0.4999997615814209]]
0 xor 1: [[0.5000002384185791]]
1 xor 0: [[0.4999997615814209]]
1 xor 1: [[0.5000001788139343]]

My 2x2x1 network should be able to perform XOR, and there is even some evidence that suggests that this network should always converge http://www.ncbi.nlm.nih.gov/pubmed/12662805

I have also tried to change the relu layers to sigmoid, to perform 2048 iterations, and to make a 4x4x1 and 6x6x1 networks, but the same problem still occurs sometimes.

Could there be something wrong with how the weights are initialized? How do I use tflearn to have a neural net learn the xor function?

555

asked May 11 '16 14:05

rdezbolcom

1 Answers

The network with relus (as it is written in the code snippet) is expected to often fail to train. The reason for that is that if the input to relu is less than zero, the output is zero, and therefore the gradient going back is also zero.

Since you have two layers, each having only two relu units, with random initialization each of these two layers has 25% of having all its neurons returning zero, and therefore having zero gradient going back => neural network will not learn at all. In such a network the output of the last layer (before the final sigmoid) will be zero, sigmoid of which is 0.5 -- exactly what you observe on the attempts on which your network didn't converge.

Since each layer has 25% chance of doing this damage, the entire network has a total chance of around 45% (1 - (1 - 0.25)^2) of failing to train from the get go. There's also a non-zero chance that the network is not in such a state at the beginning, but happens to bring itself to such a state during training, further increasing the chance of divergence.

With four neurons the chance will be significantly lower, but still not zero.

Now, the only thing I cannot answer is why your network doesn't converge when you replace relu with sigmoid -- such a network should be always able to learn "xor". My only hypothesis is that you replaced only one relu with sigmoid, not both of them.

Can you replace both relus with sigmoids and confirm you still observe divergence?

answered Sep 19 '22 22:09

Ishamael

Related questions
                            
                                PyQt4 - creating a timer
                            
                                count number of black pixels in an image in Python with OpenCV
                            
                                eigenvectors created by numpy.linalg.eig don't seem correct
                            
                                Pyspark changing type of column from date to string
                            
                                xlwings function to find the last row with data
                            
                                Symbol not found: _BIO_new_CMS
                            
                                Align text for OCR
                            
                                How do I change the dtype in TensorFlow for a csv file?
                            
                                Monitoring django rest framework api on production server
                            
                                Attach a queue to a numpy array in tensorflow for data fetch instead of files?
                            
                                How to check for empty request.FILE in Django
                            
                                OpenCV for Python 3.5.1
                            
                                Python: Read hex from file into list?
                            
                                sum values of columns starting with the same string in pandas dataframe
                            
                                Parsing through json data for aws sns event data in python
                            
                                How to divide each element in a tuple by a single integer? [closed]
                            
                                Save pandas dataframe but conserving NA values
                            
                                Convert unicode json to normal json in python
                            
                                How to change font size in ttk.Button?
                            
                                PyCharm - can't use remote interpreter

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

tflearn / tensorflow does not learn xor

Tags:

python

machine-learning

tensorflow

deep-learning

rdezbolcom

People also ask

1 Answers

Ishamael

Recent Activity

Donate For Us