Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to use keras for XOR

I want to practice keras by code a xor, but the result is not right, the followed is my code, thanks for everybody to help me.

from keras.models import Sequential
from keras.layers.core import Dense,Activation
from keras.optimizers import SGD
import numpy as np

model = Sequential()# two layers
model.add(Dense(input_dim=2,output_dim=4,init="glorot_uniform"))
model.add(Activation("sigmoid"))
model.add(Dense(input_dim=4,output_dim=1,init="glorot_uniform"))
model.add(Activation("sigmoid"))
sgd = SGD(l2=0.0,lr=0.05, decay=1e-6, momentum=0.11, nesterov=True)
model.compile(loss='mean_absolute_error', optimizer=sgd)
print "begin to train"
list1 = [1,1]
label1 = [0]
list2 = [1,0]
label2 = [1]
list3 = [0,0]
label3 = [0]
list4 = [0,1]
label4 = [1] 
train_data = np.array((list1,list2,list3,list4)) #four samples for epoch = 1000
label = np.array((label1,label2,label3,label4))

model.fit(train_data,label,nb_epoch = 1000,batch_size = 4,verbose = 1,shuffle=True,show_accuracy = True)
list_test = [0,1]
test = np.array((list_test,list1))
classes = model.predict(test)
print classes

Output

[[ 0.31851079] [ 0.34130159]] [[ 0.49635666] [0.51274764]] 
like image 493
Jaspn Wjbian Avatar asked Jul 22 '15 07:07

Jaspn Wjbian


People also ask

What are the different types of models available in keras?

Keras offers two different APIs to construct a model: a functional and a sequential one. We’re using the sequential API hence the second import of Sequential from keras.models. Neural networks consist of different layers where input data flows through and gets transformed on its way. There are a bunch of different layer types available in Keras.

How do you train a keras model?

We kick off the training by calling model.fit (...) with a bunch of parameters. The first two params are training and target data, the third one is the number of epochs (learning iterations) and the last one tells keras how much info to print out during the training. Once the training phase finished we can start making predictions.

How does keras make predictions?

The first two params are training and target data, the third one is the number of epochs (learning iterations) and the last one tells keras how much info to print out during the training. Once the training phase finished we can start making predictions.

What is the output layer in keras?

In Keras we defines our output layer as follows: A deep learning network can have multiple hidden units. The purpose of hidden units is the learn some hidden feature or representation of input data which eventually helps in solving the problem at hand.


3 Answers

If I increase the number of epochs in your code to 50000 it does often converge to the right answer for me, just takes a little while :)

It does often get stuck, though. I get better convergence properties if I change your loss function to 'mean_squared_error', which is a smoother function.

I get still faster convergence if I use the Adam or RMSProp optimizers. My final compile line, which works:

model.compile(loss='mse', optimizer='adam')
...
model.fit(train_data, label, nb_epoch = 10000,batch_size = 4,verbose = 1,shuffle=True,show_accuracy = True)
like image 113
wxs Avatar answered Oct 20 '22 17:10

wxs


I used a single hidden layer with 4 hidden nodes, and it almost always converges to the right answer within 500 epochs. I used sigmoid activations.

like image 24
Anon Avatar answered Oct 20 '22 16:10

Anon


XOR training with Keras

Below, the minimal neuron network architecture required to learn XOR which should be a (2,2,1) network. In fact, if maths shows that the (2,2,1) network can solve the XOR problem, but maths doesn't show that the (2,2,1) network is easy to train. It could sometimes takes a lot of epochs (iterations) or does not converge to the global minimum. That said, I've got easily good results with (2,3,1) or (2,4,1) network architectures.

The problem seems to be related to the existence of many local minima. Look at this 1998 paper, «Learning XOR: exploring the space of a classic problem» by Richard Bland. Furthermore weights initialization with random number between 0.5 and 1.0 helps to converge.

It works fine with Keras or TensorFlow using loss function 'mean_squared_error', sigmoid activation and Adam optimizer. Even with pretty good hyperparameters, I observed that the learned XOR model is trapped in a local minimum about 15% of the time.

from keras.models import Sequential
from keras.layers.core import Dense, Dropout, Activation
from tensorflow.keras import initializers
import numpy as np 

X = np.array([[0,0],[0,1],[1,0],[1,1]])
y = np.array([[0],[1],[1],[0]])

def initialize_weights(shape, dtype=None):
    return np.random.normal(loc = 0.75, scale = 1e-2, size = shape)

model = Sequential()
model.add(Dense(2, 
                activation='sigmoid', 
                kernel_initializer=initialize_weights, 
                input_dim=2))
model.add(Dense(1, activation='sigmoid'))

model.compile(loss='mean_squared_error', 
              optimizer='adam', 
              metrics=['accuracy'])

print("*** Training... ***")

model.fit(X, y, batch_size=4, epochs=10000, verbose=0)

print("*** Training done! ***")

print("*** Model prediction on [[0,0],[0,1],[1,0],[1,1]] ***")

print(model.predict_proba(X))

*** Training... ***

*** Training done! ***

*** Model prediction on [[0,0],[0,1],[1,0],[1,1]] ***

[[0.08662204] [0.9235283 ] [0.92356336] [0.06672956]]

like image 28
Claude COULOMBE Avatar answered Oct 20 '22 17:10

Claude COULOMBE