Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

tensorflow deep neural network for regression always predict same results in one batch

I use a tensorflow to implement a simple multi-layer perceptron for regression. The code is modified from standard mnist classifier, that I only changed the output cost to MSE (use tf.reduce_mean(tf.square(pred-y))), and some input, output size settings. However, if I train the network using regression, after several epochs, the output batch are totally the same. for example:

target: 48.129, estimated: 42.634 target: 46.590, estimated: 42.634 target: 34.209, estimated: 42.634 target: 69.677, estimated: 42.634 ...... 

I have tried different batch size, different initialization, input normalization using sklearn.preprocessing.scale (my inputs range are quite different). However, none of them worked. I have also tried one of sklearn example from Tensorflow (Deep Neural Network Regression with Boston Data). But I got another error in line 40:

'module' object has no attribute 'infer_real_valued_columns_from_input'

Anyone has clues on where the problem is? Thank you

My code is listed below, may be a little bit long, but very straghtforward:

from __future__ import absolute_import from __future__ import division from __future__ import print_function  import tensorflow as tf from tensorflow.contrib import learn import matplotlib.pyplot as plt  from sklearn.pipeline import Pipeline from sklearn import datasets, linear_model from sklearn import cross_validation import numpy as np  boston = learn.datasets.load_dataset('boston') x, y = boston.data, boston.target X_train, X_test, Y_train, Y_test = cross_validation.train_test_split( x, y, test_size=0.2, random_state=42)  total_len = X_train.shape[0]  # Parameters learning_rate = 0.001 training_epochs = 500 batch_size = 10 display_step = 1 dropout_rate = 0.9 # Network Parameters n_hidden_1 = 32 # 1st layer number of features n_hidden_2 = 200 # 2nd layer number of features n_hidden_3 = 200 n_hidden_4 = 256 n_input = X_train.shape[1] n_classes = 1  # tf Graph input x = tf.placeholder("float", [None, 13]) y = tf.placeholder("float", [None])  # Create model def multilayer_perceptron(x, weights, biases):     # Hidden layer with RELU activation     layer_1 = tf.add(tf.matmul(x, weights['h1']), biases['b1'])     layer_1 = tf.nn.relu(layer_1)      # Hidden layer with RELU activation     layer_2 = tf.add(tf.matmul(layer_1, weights['h2']), biases['b2'])     layer_2 = tf.nn.relu(layer_2)      # Hidden layer with RELU activation     layer_3 = tf.add(tf.matmul(layer_2, weights['h3']), biases['b3'])     layer_3 = tf.nn.relu(layer_3)      # Hidden layer with RELU activation     layer_4 = tf.add(tf.matmul(layer_3, weights['h4']), biases['b4'])     layer_4 = tf.nn.relu(layer_4)      # Output layer with linear activation     out_layer = tf.matmul(layer_4, weights['out']) + biases['out']     return out_layer  # Store layers weight & bias weights = {     'h1': tf.Variable(tf.random_normal([n_input, n_hidden_1], 0, 0.1)),     'h2': tf.Variable(tf.random_normal([n_hidden_1, n_hidden_2], 0, 0.1)),     'h3': tf.Variable(tf.random_normal([n_hidden_2, n_hidden_3], 0, 0.1)),     'h4': tf.Variable(tf.random_normal([n_hidden_3, n_hidden_4], 0, 0.1)),     'out': tf.Variable(tf.random_normal([n_hidden_4, n_classes], 0, 0.1)) } biases = {     'b1': tf.Variable(tf.random_normal([n_hidden_1], 0, 0.1)),     'b2': tf.Variable(tf.random_normal([n_hidden_2], 0, 0.1)),     'b3': tf.Variable(tf.random_normal([n_hidden_3], 0, 0.1)),     'b4': tf.Variable(tf.random_normal([n_hidden_4], 0, 0.1)),     'out': tf.Variable(tf.random_normal([n_classes], 0, 0.1)) }  # Construct model pred = multilayer_perceptron(x, weights, biases)  # Define loss and optimizer cost = tf.reduce_mean(tf.square(pred-y)) optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate).minimize(cost)  # Launch the graph with tf.Session() as sess:     sess.run(tf.initialize_all_variables())      # Training cycle     for epoch in range(training_epochs):         avg_cost = 0.         total_batch = int(total_len/batch_size)         # Loop over all batches         for i in range(total_batch-1):             batch_x = X_train[i*batch_size:(i+1)*batch_size]             batch_y = Y_train[i*batch_size:(i+1)*batch_size]             # Run optimization op (backprop) and cost op (to get loss value)             _, c, p = sess.run([optimizer, cost, pred], feed_dict={x: batch_x,                                                           y: batch_y})             # Compute average loss             avg_cost += c / total_batch          # sample prediction         label_value = batch_y         estimate = p         err = label_value-estimate         print ("num batch:", total_batch)          # Display logs per epoch step         if epoch % display_step == 0:             print ("Epoch:", '%04d' % (epoch+1), "cost=", \                 "{:.9f}".format(avg_cost))             print ("[*]----------------------------")             for i in xrange(3):                 print ("label value:", label_value[i], \                     "estimated value:", estimate[i])             print ("[*]============================")      print ("Optimization Finished!")      # Test model     correct_prediction = tf.equal(tf.argmax(pred, 1), tf.argmax(y, 1))     # Calculate accuracy     accuracy = tf.reduce_mean(tf.cast(correct_prediction, "float"))     print ("Accuracy:", accuracy.eval({x: X_test, y: Y_test})) 
like image 563
Sufeng Niu Avatar asked Jul 15 '16 15:07

Sufeng Niu


People also ask

Is deep learning good for regression?

Deep learning offers several advantages over machine learning but can't replace it with simple problems. This article created regression models using both deep learning and simple machine learning algorithms. We saw that training a deep learning model might not be the best choice every time from the results.

Can deep neural networks be used for regression?

You can "use" deep learning for regression. You have to consider the following: You can use a fully connected neural network for regression, just don't use any activation unit in the end (i.e. take out the RELU, sigmoid) and just let the input parameter flow-out (y=x).

Can neural networks solve regression problems?

Neural networks are flexible and can be used for both classification and regression.

Can we use neural network for prediction?

Use of neural networks prediction in predictive analyticsNeural networks work better at predictive analytics because of the hidden layers. Linear regression models use only input and output nodes to make predictions. The neural network also uses the hidden layer to make predictions more accurate.

How do I download a neural network model in TensorFlow?

Right click on the file in the files pane and click 'download'. Use the code below. Alright, we've seen the fundamentals of building neural network regression models in TensorFlow. Let's step it up a notch and build a model for a more feature rich datase.

Is it possible to build a regression model with TensorFlow?

The code exposed will allow you to build a regression model, specify the categorical features and build your own activation function with Tensorflow. The data used corresponds to a Kaggle’s competition House Prices: Advanced Regression Techniques.

How do I start building a deep neural network model?

Before building a deep neural network model, start with linear regression using one and several variables. Begin with a single-variable linear regression to predict 'MPG' from 'Horsepower'.

What can you do with TensorFlow?

Don’t forget to read the previous post on Getting Started With Tensorflow! Show your support by subscribing to our newsletter! Tensorflow was originally developed to construct the more complex neural networks used in tasks such as time-series analysis , word-embedding , image processing and reinforcement learning.


1 Answers

Short answer:

Transpose your pred vector using tf.transpose(pred).

Longer answer:

The problem is that pred (the predictions) and y (the labels) are not of the same shape: one is a row vector and the other a column vector. Apparently when you apply an element-wise operation on them, you'll get a matrix, which is not what you want.

The solution is to transpose the prediction vector using tf.transpose() to get a proper vector and thus a proper loss function. Actually, if you set the batch size to 1 in your example you'll see that it works even without the fix, because transposing a 1x1 vector is a no-op.

I applied this fix to your example code and observed the following behaviour. Before the fix:

Epoch: 0245 cost= 84.743440580 [*]---------------------------- label value: 23 estimated value: [ 27.47437096] label value: 50 estimated value: [ 24.71126747] label value: 22 estimated value: [ 23.87785912] 

And after the fix at the same point in time:

Epoch: 0245 cost= 4.181439120 [*]---------------------------- label value: 23 estimated value: [ 21.64333534] label value: 50 estimated value: [ 48.76105118] label value: 22 estimated value: [ 24.27996063] 

You'll see that the cost is much lower and that it actually learned the value 50 properly. You'll have to do some fine-tuning on the learning rate and such to improve your results of course.

like image 142
CNugteren Avatar answered Oct 14 '22 22:10

CNugteren