I am trying to write a MLP with TensorFlow (which I just started to learn, so apologies for the code!) for multivariate REGRESSION (no MNIST, please). Here is my MWE, where I chose to use the linnerud dataset from sklearn. (In reality I am using a much larger dataset, also here I am only using one layer because I wanted to make the MWE smaller, but I can add, if necessary). By the way I am using shuffle = False
in train_test_split
just because in reality I am working with a time series dataset.
MWE
######################### import stuff ##########################
import numpy as np
import pandas as pd
import tensorflow as tf
from sklearn.datasets import load_linnerud
from sklearn.model_selection import train_test_split
######################## prepare the data ########################
X, y = load_linnerud(return_X_y = True)
X_train, X_test, y_train, y_test = train_test_split(X, y, shuffle = False, test_size = 0.33)
######################## set learning variables ##################
learning_rate = 0.0001
epochs = 100
batch_size = 3
######################## set some variables #######################
x = tf.placeholder(tf.float32, [None, 3], name = 'x') # 3 features
y = tf.placeholder(tf.float32, [None, 3], name = 'y') # 3 outputs
# input-to-hidden layer1
W1 = tf.Variable(tf.truncated_normal([3,300], stddev = 0.03), name = 'W1')
b1 = tf.Variable(tf.truncated_normal([300]), name = 'b1')
# hidden layer1-to-output
W2 = tf.Variable(tf.truncated_normal([300,3], stddev = 0.03), name= 'W2')
b2 = tf.Variable(tf.truncated_normal([3]), name = 'b2')
######################## Activations, outputs ######################
# output hidden layer 1
hidden_out = tf.nn.relu(tf.add(tf.matmul(x, W1), b1))
# total output
y_ = tf.nn.relu(tf.add(tf.matmul(hidden_out, W2), b2))
####################### Loss Function #########################
mse = tf.losses.mean_squared_error(y, y_)
####################### Optimizer #########################
optimizer = tf.train.GradientDescentOptimizer(learning_rate = learning_rate).minimize(mse)
###################### Initialize, Accuracy and Run #################
# initialize variables
init_op = tf.global_variables_initializer()
# accuracy for the test set
accuracy = tf.reduce_mean(tf.square(tf.subtract(y, y_))) # or could use tf.losses.mean_squared_error
#run
with tf.Session() as sess:
sess.run(init_op)
total_batch = int(len(y_train) / batch_size)
for epoch in range(epochs):
avg_cost = 0
for i in range(total_batch):
batch_x, batch_y = X_train[i*batch_size:min(i*batch_size + batch_size, len(X_train)), :], y_train[i*batch_size:min(i*batch_size + batch_size, len(y_train)), :]
_, c = sess.run([optimizer, mse], feed_dict = {x: batch_x, y: batch_y})
avg_cost += c / total_batch
print('Epoch:', (epoch+1), 'cost =', '{:.3f}'.format(avg_cost))
print(sess.run(mse, feed_dict = {x: X_test, y:y_test}))
This prints out something like this
...
Epoch: 98 cost = 10992.617
Epoch: 99 cost = 10992.592
Epoch: 100 cost = 10992.566
11815.1
So obviously there is something wrong. I am suspecting that the problem is either in the cost function/accuracy or in the way I am using batches, but I can't quite figure it out..
MLPs are suitable for classification prediction problems where inputs are assigned a class or label. They are also suitable for regression prediction problems where a real-valued quantity is predicted given a set of inputs.
Yes a perceptron (one fully connected unit) can be used for regression. It will just be a linear regressor. If you use no activation function you get a regressor and if you put a sigmoid activation you get a classifier.
An MLP is a typical example of a feedforward artificial neural network. In this figure, the ith activation unit in the lth layer is denoted as ai(l). The number of layers and the number of neurons are referred to as hyperparameters of a neural network, and these need tuning.
Convolutional neural networks (CNNs, or ConvNets) are essential tools for deep learning, and are especially suited for analyzing image data. For example, you can use CNNs to classify images. To predict continuous data, such as angles and distances, you can include a regression layer at the end of the network.
As far as I can see, the model is learning. I tried to tune some of hyperparameters (most significantly - the learning rate and hidden layer size) and got much better results. Here's the full code:
######################### import stuff ##########################
import numpy as np
import pandas as pd
import tensorflow as tf
from sklearn.datasets import load_linnerud
from sklearn.model_selection import train_test_split
######################## prepare the data ########################
X, y = load_linnerud(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, shuffle=False)
######################## set learning variables ##################
learning_rate = 0.0005
epochs = 2000
batch_size = 3
######################## set some variables #######################
x = tf.placeholder(tf.float32, [None, 3], name='x') # 3 features
y = tf.placeholder(tf.float32, [None, 3], name='y') # 3 outputs
# hidden layer 1
W1 = tf.Variable(tf.truncated_normal([3, 10], stddev=0.03), name='W1')
b1 = tf.Variable(tf.truncated_normal([10]), name='b1')
# hidden layer 2
W2 = tf.Variable(tf.truncated_normal([10, 3], stddev=0.03), name='W2')
b2 = tf.Variable(tf.truncated_normal([3]), name='b2')
######################## Activations, outputs ######################
# output hidden layer 1
hidden_out = tf.nn.relu(tf.add(tf.matmul(x, W1), b1))
# total output
y_ = tf.nn.relu(tf.add(tf.matmul(hidden_out, W2), b2))
####################### Loss Function #########################
mse = tf.losses.mean_squared_error(y, y_)
####################### Optimizer #########################
optimizer = tf.train.GradientDescentOptimizer(learning_rate=learning_rate).minimize(mse)
###################### Initialize, Accuracy and Run #################
# initialize variables
init_op = tf.global_variables_initializer()
# accuracy for the test set
accuracy = tf.reduce_mean(tf.square(tf.subtract(y, y_))) # or could use tf.losses.mean_squared_error
# run
with tf.Session() as sess:
sess.run(init_op)
total_batch = int(len(y_train) / batch_size)
for epoch in range(epochs):
avg_cost = 0
for i in range(total_batch):
batch_x, batch_y = X_train[i * batch_size:min(i * batch_size + batch_size, len(X_train)), :], \
y_train[i * batch_size:min(i * batch_size + batch_size, len(y_train)), :]
_, c = sess.run([optimizer, mse], feed_dict={x: batch_x, y: batch_y})
avg_cost += c / total_batch
if epoch % 10 == 0:
print 'Epoch:', (epoch + 1), 'cost =', '{:.3f}'.format(avg_cost)
print sess.run(mse, feed_dict={x: X_test, y: y_test})
Output:
Epoch: 1901 cost = 173.914
Epoch: 1911 cost = 171.928
Epoch: 1921 cost = 169.993
Epoch: 1931 cost = 168.110
Epoch: 1941 cost = 166.277
Epoch: 1951 cost = 164.492
Epoch: 1961 cost = 162.753
Epoch: 1971 cost = 161.061
Epoch: 1981 cost = 159.413
Epoch: 1991 cost = 157.808
482.433
I think you can tune it even further, but it doesn't make sense since the data is so small. I didn't experiment with regularization though, but I'm sure you'll need it L2 reg or dropout to avoid overfitting.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With