Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Getting same predicted values for all inputs in trained tensor flow network

I have created a tensorflow network designed to read data from this dataset (note: the information in this dataset is designed purely for test purposes and is not real):enter image description here and am trying to build a tensorflow network designed to essentially predict values in the 'Exited' column. My network is structured to take 11 inputs, pass through 2 hidden layers (6 neurons each) with relu activation, and output a single binary value using a sigmoid activation function in order to produce a probability distribution. I am using a gradient descent optimizer and a mean squared error cost function. However, after training the network on my training data and predicting off my testing data, all my predicted values are greater than 0.5 meaning likely to be true and I'm not sure what the problem is:

X_train, X_test, y_train, y_test = train_test_split(X_data, y_data, test_size=0.2, random_state=101)
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.fit_transform(X_test)

training_epochs = 200
n_input = 11
n_hidden_1 = 6
n_hidden_2 = 6
n_output = 1

def neuralNetwork(x, weights):
     layer_1 = tf.add(tf.matmul(x, weights['h1']), biases['b1'])
     layer_1 = tf.nn.relu(layer_1)
     layer_2 = tf.add(tf.matmul(layer_1, weights['h2']), biases['b2'])
     layer_2 = tf.nn.relu(layer_2)
     output_layer = tf.add(tf.matmul(layer_2, weights['output']), biases['output'])
     output_layer = tf.nn.sigmoid(output_layer)
     return output_layer

weights = {
    'h1': tf.Variable(tf.random_uniform([n_input, n_hidden_1])),
    'h2': tf.Variable(tf.random_uniform([n_hidden_1, n_hidden_2])),
    'output': tf.Variable(tf.random_uniform([n_hidden_2, n_output]))
}

biases = {
    'b1': tf.Variable(tf.random_uniform([n_hidden_1])),
    'b2': tf.Variable(tf.random_uniform([n_hidden_2])),
    'output': tf.Variable(tf.random_uniform([n_output]))
}

x = tf.placeholder('float', [None, n_input]) # [?, 11]
y = tf.placeholder('float', [None, n_output]) # [?, 1]

output = neuralNetwork(x, weights)
cost = tf.reduce_mean(tf.square(output - y))
optimizer = tf.train.AdamOptimizer().minimize(cost)

with tf.Session() as session:
    session.run(tf.global_variables_initializer())
    for epoch in range(training_epochs):
        session.run(optimizer, feed_dict={x:X_train, y:y_train.reshape((-1,1))})
    print('Model has completed training.')
    test = session.run(output, feed_dict={x:X_test})
    predictions = (test>0.5).astype(int)
    print(predictions)

All help is appreciated! I have been looking through questions related to my problem but none of the suggestions have seemed to help.

like image 329
mlz7 Avatar asked Apr 21 '18 02:04

mlz7


1 Answers

Initial assumption: I won't access data from a personal link for security reasons. It would be better if you could create a reproducible code snippet based solely on secure/persistent artifacts.
However, I can confirm your problem happens when your code is ran against keras.datasets.mnist, with a small change: each sample is associated with a label 0: odd or 1: even.

Short answer: you messed up the initialization. Change tf.random_uniform to tf.random_normal and set biases to a deterministic 0.

Actual answer: ideally, you want the model to start predicting randomly, close to the 0.5. This will prevent the saturation of the sigmoid's output and result in large gradients in early stages of training.

The sigmoid's eq. is s(y) = 1/(1 + e**-y), and s(y) = 0.5 <=> y = 0. Therefore, the layer's output y = w * x + b must be 0.

If you used StandardScaler, then your input data follows a Gaussian distribution, mean = 0.5, std = 1.0. Your parameters must sustain this distribution! However, you've initialized your biases with tf.random_uniform, which uniformly draws values from the [0, 1) interval.

By starting your biases at 0, y will be close to 0:

y = w * x + b = sum(.1 * -1, .9 * -.9, ..., .1 * 1, .9 * .9) + 0 = 0

So your biases should be:

biases = {
    'b1': tf.Variable(tf.zeros([n_hidden_1])),
    'b2': tf.Variable(tf.zeros([n_hidden_2])),
    'output': tf.Variable(tf.zeros([n_output]))
}

This is sufficient to output numbers smaller than 0.5:

[1.        0.4492423 0.4492423 ... 0.4492423 0.4492423 1.       ]
predictions mean: 0.7023628
confusion matrix:
[[4370 1727]
 [1932 3971]]
accuracy: 0.6950833333333334

Further corrections:

  • Your neuralNetwork function does not take a biases parameter. It instead uses the one defined in the other scope, which seems like a mistake.

  • You should not fit the scaler to the test data, because you will lose the statistics from train and because it violates the principle that that chunk of data is purely observational. Do this:

     scaler = StandardScaler()
     x_train = scaler.fit_transform(x_train)
     x_test = scaler.transform(x_test)
    
  • It's very uncommon to use MSE with sigmoid output. Use binary cross-entropy instead:

     logits = tf.add(tf.matmul(layer_2, weights['output']), biases['output'])
     output = tf.nn.sigmoid(logits)
     cost = tf.nn.sigmoid_cross_entropy_with_logits(labels=y, logits=logits)
    
  • It's more reliable to initialize the weights from a normal distribution:

     weights = {
         'h1': tf.Variable(tf.random_uniform([n_input, n_hidden_1])),
         'h2': tf.Variable(tf.random_uniform([n_hidden_1, n_hidden_2])),
         'output': tf.Variable(tf.random_uniform([n_hidden_2, n_output]))
     }
    
  • You are feeding the entire train dataset at each epoch, instead of batching it, which is the default in Keras. Therefore, it's reasonable to assume Keras implementation will converge faster and the results might differ.

By making a few teaks, I manage to achieve this results:

import tensorflow as tf
from keras.datasets.mnist import load_data
from sacred import Experiment
from sklearn.metrics import accuracy_score, confusion_matrix
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler

ex = Experiment('test-16')


@ex.config
def my_config():
    training_epochs = 200
    n_input = 784
    n_hidden_1 = 32
    n_hidden_2 = 32
    n_output = 1


def neuralNetwork(x, weights, biases):
    layer_1 = tf.add(tf.matmul(x, weights['h1']), biases['b1'])
    layer_1 = tf.nn.relu(layer_1)
    layer_2 = tf.add(tf.matmul(layer_1, weights['h2']), biases['b2'])
    layer_2 = tf.nn.relu(layer_2)
    logits = tf.add(tf.matmul(layer_2, weights['output']), biases['output'])
    predictions = tf.nn.sigmoid(logits)
    return logits, predictions


@ex.automain
def main(training_epochs, n_input, n_hidden_1, n_hidden_2, n_output):
    (x_train, y_train), _ = load_data()
    x_train = x_train.reshape(x_train.shape[0], -1).astype(float)
    y_train = (y_train % 2 == 0).reshape(-1, 1).astype(float)

    x_train, x_test, y_train, y_test = train_test_split(x_train, y_train, test_size=0.2, random_state=101)
    print('y samples:', y_train, y_test, sep='\n')

    scaler = StandardScaler()
    x_train = scaler.fit_transform(x_train)
    x_test = scaler.transform(x_test)

    weights = {
        'h1': tf.Variable(tf.random_normal([n_input, n_hidden_1])),
        'h2': tf.Variable(tf.random_normal([n_hidden_1, n_hidden_2])),
        'output': tf.Variable(tf.random_normal([n_hidden_2, n_output]))
    }

    biases = {
        'b1': tf.Variable(tf.zeros([n_hidden_1])),
        'b2': tf.Variable(tf.zeros([n_hidden_2])),
        'output': tf.Variable(tf.zeros([n_output]))
    }

    x = tf.placeholder('float', [None, n_input])  # [?, 11]
    y = tf.placeholder('float', [None, n_output])  # [?, 1]

    logits, output = neuralNetwork(x, weights, biases)
    # cost = tf.reduce_mean(tf.square(output - y))
    cost = tf.nn.sigmoid_cross_entropy_with_logits(labels=y, logits=logits)
    optimizer = tf.train.AdamOptimizer().minimize(cost)

    with tf.Session() as session:
        session.run(tf.global_variables_initializer())
        try:
            for epoch in range(training_epochs):
                print('epoch #%i' % epoch)
                session.run(optimizer, feed_dict={x: x_train, y: y_train})

        except KeyboardInterrupt:
            print('interrupted')

        print('Model has completed training.')
        p = session.run(output, feed_dict={x: x_test})
        p_labels = (p > 0.5).astype(int)

        print(p.ravel())
        print('predictions mean:', p.mean())

        print('confusion matrix:', confusion_matrix(y_test, p_labels), sep='\n')
        print('accuracy:', accuracy_score(y_test, p_labels))
[0.        1.        0.        ... 0.0302309 0.        1.       ]
predictions mean: 0.48261687
confusion matrix:
[[5212  885]
 [ 994 4909]]
accuracy: 0.8434166666666667
like image 120
ldavid Avatar answered Nov 15 '22 04:11

ldavid