I am playing around with Keras v2.0.8 in Python v2.7 (Tensorflow backend) to create small neural networks that calculate simple arithmetic functions (add, subtract, multiply, etc.), and am a bit confused. The below code is my network which generates a random training dataset of integers with the corresponding labels (the two inputs added together):
def create_data(low, high, examples):
train_data = []
label_data = []
a = np.random.randint(low=low, high=high, size=examples, dtype='int')
b = np.random.randint(low=low, high=high, size=examples, dtype='int')
for i in range(0, examples):
train_data.append([a[i], b[i]])
label_data.append((a[i] + b[i]))
train_data = np.array(train_data)
label_data = np.array(label_data)
return train_data, label_data
X, y = create_data(0, 500, 10000)
model = Sequential()
model.add(Dense(3, input_dim=2))
model.add(Dense(5, activation='relu'))
model.add(Dense(3, activation='relu'))
model.add(Dense(5, activation='relu'))
model.add(Dense(1, activation='relu'))
model.compile(optimizer='adam', loss='mean_squared_error', metrics=['accuracy'])
model.fit(X, y, epochs=10, batch_size=10)
test_data, _ = create_data(0, 500, 10)
results = model.predict(test_data, batch_size=2)
sq_error = []
for i in range(0, len(test_data)):
print 'test value:', test_data[i], 'result:', results[i][0], 'error:',\
'%.2f' %(results[i][0] - (test_data[i][0] + test_data[i][1]))
sq_error.append((results[i][0] - (test_data[i][0] + test_data[i][1])))
print '\n total rmse error: ', sqrt(np.sum(np.array(sq_error)))
This trains perfectly well and produces no unexpected results. However, when I create the training data by multiplying the two inputs together the model's loss for each epoch stays around 7,000,000,000 and the model does not converge at all. The data creation function for this is as follows:
def create_data(low, high, examples):
train_data = []
label_data = []
a = np.random.randint(low=low, high=high, size=examples, dtype='int')
b = np.random.randint(low=low, high=high, size=examples, dtype='int')
for i in range(0, examples):
train_data.append([a[i], b[i]])
label_data.append((a[i] * b[i]))
train_data = np.array(train_data)
label_data = np.array(label_data)
return train_data, label_data
I also had the same problem when I had training data of a single input integer and created the label by squaring the input data. However, it worked fine when I only multiplied the single input by a constant value or added/subtracted by a constant.
I have two questions:
1) Why is this the case? I assume it has something to do with the fundamentals of neural networks, but I can't work it out.
2) How could I adapt this code to train a model that multiplies two input numbers together.
The network architecture (2 - 3 - 5 - 3 - 5 - 1) is fairly random right now. I've tried lots of different ones varying in layers and neurons, this one just happened to be on my screen as I write this and got an accuracy of 100% for adding two inputs.
It is due to large gradient updates caused by large numbers in training data. When using a neural network, you should first ensure that the training data falls in a small range (usually [-1,1] or [0,1]) to help the optimization process and prevent disruptive gradient updates. Therefore, you should first normalize data. In this case, one good candidate would be log-normalization.
Further, the 'accuracy'
as a metric in Keras is used in case of a classification problem. In a regression problem, using it does not make sense, and instead it's better to use a relevant metric like "mean absolute error" or 'mae'
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With