I have created a tensorflow network designed to read data from this dataset (note: the information in this dataset is designed purely for test purposes and is not real): and am trying to build a tensorflow network designed to essentially predict values in the 'Exited' column. My network is structured to take 11 inputs, pass through 2 hidden layers (6 neurons each) with relu activation, and output a single binary value using a sigmoid activation function in order to produce a probability distribution. I am using a gradient descent optimizer and a mean squared error cost function. However, after training the network on my training data and predicting off my testing data, all my predicted values are greater than 0.5 meaning likely to be true and I'm not sure what the problem is:
X_train, X_test, y_train, y_test = train_test_split(X_data, y_data, test_size=0.2, random_state=101)
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.fit_transform(X_test)
training_epochs = 200
n_input = 11
n_hidden_1 = 6
n_hidden_2 = 6
n_output = 1
def neuralNetwork(x, weights):
layer_1 = tf.add(tf.matmul(x, weights['h1']), biases['b1'])
layer_1 = tf.nn.relu(layer_1)
layer_2 = tf.add(tf.matmul(layer_1, weights['h2']), biases['b2'])
layer_2 = tf.nn.relu(layer_2)
output_layer = tf.add(tf.matmul(layer_2, weights['output']), biases['output'])
output_layer = tf.nn.sigmoid(output_layer)
return output_layer
weights = {
'h1': tf.Variable(tf.random_uniform([n_input, n_hidden_1])),
'h2': tf.Variable(tf.random_uniform([n_hidden_1, n_hidden_2])),
'output': tf.Variable(tf.random_uniform([n_hidden_2, n_output]))
}
biases = {
'b1': tf.Variable(tf.random_uniform([n_hidden_1])),
'b2': tf.Variable(tf.random_uniform([n_hidden_2])),
'output': tf.Variable(tf.random_uniform([n_output]))
}
x = tf.placeholder('float', [None, n_input]) # [?, 11]
y = tf.placeholder('float', [None, n_output]) # [?, 1]
output = neuralNetwork(x, weights)
cost = tf.reduce_mean(tf.square(output - y))
optimizer = tf.train.AdamOptimizer().minimize(cost)
with tf.Session() as session:
session.run(tf.global_variables_initializer())
for epoch in range(training_epochs):
session.run(optimizer, feed_dict={x:X_train, y:y_train.reshape((-1,1))})
print('Model has completed training.')
test = session.run(output, feed_dict={x:X_test})
predictions = (test>0.5).astype(int)
print(predictions)
All help is appreciated! I have been looking through questions related to my problem but none of the suggestions have seemed to help.
Initial assumption: I won't access data from a personal link for security reasons. It would be better if you could create a reproducible code snippet based solely on secure/persistent artifacts.
However, I can confirm your problem happens when your code is ran against keras.datasets.mnist
, with a small change: each sample is associated with a label 0: odd
or 1: even
.
Short answer: you messed up the initialization. Change tf.random_uniform
to tf.random_normal
and set biases to a deterministic 0
.
Actual answer: ideally, you want the model to start predicting randomly, close to the 0.5
. This will prevent the saturation of the sigmoid's output and result in large gradients in early stages of training.
The sigmoid's eq. is s(y) = 1/(1 + e**-y)
, and s(y) = 0.5 <=> y = 0
. Therefore, the layer's output y = w * x + b
must be 0
.
If you used StandardScaler
, then your input data follows a Gaussian distribution, mean = 0.5, std = 1.0. Your parameters must sustain this distribution! However, you've initialized your biases with tf.random_uniform
, which uniformly draws values from the [0, 1)
interval.
By starting your biases at 0
, y
will be close to 0
:
y = w * x + b = sum(.1 * -1, .9 * -.9, ..., .1 * 1, .9 * .9) + 0 = 0
So your biases should be:
biases = {
'b1': tf.Variable(tf.zeros([n_hidden_1])),
'b2': tf.Variable(tf.zeros([n_hidden_2])),
'output': tf.Variable(tf.zeros([n_output]))
}
This is sufficient to output numbers smaller than 0.5
:
[1. 0.4492423 0.4492423 ... 0.4492423 0.4492423 1. ]
predictions mean: 0.7023628
confusion matrix:
[[4370 1727]
[1932 3971]]
accuracy: 0.6950833333333334
Further corrections:
Your neuralNetwork
function does not take a biases
parameter. It instead uses the one defined in the other scope, which seems like a mistake.
You should not fit the scaler to the test data, because you will lose the statistics from train and because it violates the principle that that chunk of data is purely observational. Do this:
scaler = StandardScaler()
x_train = scaler.fit_transform(x_train)
x_test = scaler.transform(x_test)
It's very uncommon to use MSE with sigmoid output. Use binary cross-entropy instead:
logits = tf.add(tf.matmul(layer_2, weights['output']), biases['output'])
output = tf.nn.sigmoid(logits)
cost = tf.nn.sigmoid_cross_entropy_with_logits(labels=y, logits=logits)
It's more reliable to initialize the weights from a normal distribution:
weights = {
'h1': tf.Variable(tf.random_uniform([n_input, n_hidden_1])),
'h2': tf.Variable(tf.random_uniform([n_hidden_1, n_hidden_2])),
'output': tf.Variable(tf.random_uniform([n_hidden_2, n_output]))
}
You are feeding the entire train dataset at each epoch, instead of batching it, which is the default in Keras. Therefore, it's reasonable to assume Keras implementation will converge faster and the results might differ.
By making a few teaks, I manage to achieve this results:
import tensorflow as tf
from keras.datasets.mnist import load_data
from sacred import Experiment
from sklearn.metrics import accuracy_score, confusion_matrix
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
ex = Experiment('test-16')
@ex.config
def my_config():
training_epochs = 200
n_input = 784
n_hidden_1 = 32
n_hidden_2 = 32
n_output = 1
def neuralNetwork(x, weights, biases):
layer_1 = tf.add(tf.matmul(x, weights['h1']), biases['b1'])
layer_1 = tf.nn.relu(layer_1)
layer_2 = tf.add(tf.matmul(layer_1, weights['h2']), biases['b2'])
layer_2 = tf.nn.relu(layer_2)
logits = tf.add(tf.matmul(layer_2, weights['output']), biases['output'])
predictions = tf.nn.sigmoid(logits)
return logits, predictions
@ex.automain
def main(training_epochs, n_input, n_hidden_1, n_hidden_2, n_output):
(x_train, y_train), _ = load_data()
x_train = x_train.reshape(x_train.shape[0], -1).astype(float)
y_train = (y_train % 2 == 0).reshape(-1, 1).astype(float)
x_train, x_test, y_train, y_test = train_test_split(x_train, y_train, test_size=0.2, random_state=101)
print('y samples:', y_train, y_test, sep='\n')
scaler = StandardScaler()
x_train = scaler.fit_transform(x_train)
x_test = scaler.transform(x_test)
weights = {
'h1': tf.Variable(tf.random_normal([n_input, n_hidden_1])),
'h2': tf.Variable(tf.random_normal([n_hidden_1, n_hidden_2])),
'output': tf.Variable(tf.random_normal([n_hidden_2, n_output]))
}
biases = {
'b1': tf.Variable(tf.zeros([n_hidden_1])),
'b2': tf.Variable(tf.zeros([n_hidden_2])),
'output': tf.Variable(tf.zeros([n_output]))
}
x = tf.placeholder('float', [None, n_input]) # [?, 11]
y = tf.placeholder('float', [None, n_output]) # [?, 1]
logits, output = neuralNetwork(x, weights, biases)
# cost = tf.reduce_mean(tf.square(output - y))
cost = tf.nn.sigmoid_cross_entropy_with_logits(labels=y, logits=logits)
optimizer = tf.train.AdamOptimizer().minimize(cost)
with tf.Session() as session:
session.run(tf.global_variables_initializer())
try:
for epoch in range(training_epochs):
print('epoch #%i' % epoch)
session.run(optimizer, feed_dict={x: x_train, y: y_train})
except KeyboardInterrupt:
print('interrupted')
print('Model has completed training.')
p = session.run(output, feed_dict={x: x_test})
p_labels = (p > 0.5).astype(int)
print(p.ravel())
print('predictions mean:', p.mean())
print('confusion matrix:', confusion_matrix(y_test, p_labels), sep='\n')
print('accuracy:', accuracy_score(y_test, p_labels))
[0. 1. 0. ... 0.0302309 0. 1. ]
predictions mean: 0.48261687
confusion matrix:
[[5212 885]
[ 994 4909]]
accuracy: 0.8434166666666667
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With