I am working on the Linear Regression with Synthetic Data Colab exercise, which explores linear regression with a toy dataset. There is a linear regression model built and trained and one can play around with the learning rate, the epoch and the batch size. I have troubles understanding how exactly the iterations are done and how this connects to the "epoch" and the "batch size". I am basically not getting how the actual model is trained, how data is processed and iterations are done. To understand this I wanted to follow this by calculating each step manually. Therefore I wanted to have the slope and intercept coefficient for each step. So that I can see what kind of data the "computer" uses, puts into the model, what kind of model results at each specific iteration and how iterations are done. I tried first to get the slope and intercept for each single step, however failed, because only at the end the slope and intercept is outputted. My modified code (original, just added:)
print("Slope")
print(trained_weight)
print("Intercept")
print(trained_bias)
code:
import pandas as pd
import tensorflow as tf
from matplotlib import pyplot as plt
#@title Define the functions that build and train a model
def build_model(my_learning_rate):
"""Create and compile a simple linear regression model."""
# Most simple tf.keras models are sequential.
# A sequential model contains one or more layers.
model = tf.keras.models.Sequential()
# Describe the topography of the model.
# The topography of a simple linear regression model
# is a single node in a single layer.
model.add(tf.keras.layers.Dense(units=1,
input_shape=(1,)))
# Compile the model topography into code that
# TensorFlow can efficiently execute. Configure
# training to minimize the model's mean squared error.
model.compile(optimizer=tf.keras.optimizers.RMSprop(lr=my_learning_rate),
loss="mean_squared_error",
metrics=[tf.keras.metrics.RootMeanSquaredError()])
return model
def train_model(model, feature, label, epochs, batch_size):
"""Train the model by feeding it data."""
# Feed the feature values and the label values to the
# model. The model will train for the specified number
# of epochs, gradually learning how the feature values
# relate to the label values.
history = model.fit(x=feature,
y=label,
batch_size=batch_size,
epochs=epochs)
# Gather the trained model's weight and bias.
trained_weight = model.get_weights()[0]
trained_bias = model.get_weights()[1]
print("Slope")
print(trained_weight)
print("Intercept")
print(trained_bias)
# The list of epochs is stored separately from the
# rest of history.
epochs = history.epoch
# Gather the history (a snapshot) of each epoch.
hist = pd.DataFrame(history.history)
# print(hist)
# Specifically gather the model's root mean
#squared error at each epoch.
rmse = hist["root_mean_squared_error"]
return trained_weight, trained_bias, epochs, rmse
print("Defined create_model and train_model")
#@title Define the plotting functions
def plot_the_model(trained_weight, trained_bias, feature, label):
"""Plot the trained model against the training feature and label."""
# Label the axes.
plt.xlabel("feature")
plt.ylabel("label")
# Plot the feature values vs. label values.
plt.scatter(feature, label)
# Create a red line representing the model. The red line starts
# at coordinates (x0, y0) and ends at coordinates (x1, y1).
x0 = 0
y0 = trained_bias
x1 = my_feature[-1]
y1 = trained_bias + (trained_weight * x1)
plt.plot([x0, x1], [y0, y1], c='r')
# Render the scatter plot and the red line.
plt.show()
def plot_the_loss_curve(epochs, rmse):
"""Plot the loss curve, which shows loss vs. epoch."""
plt.figure()
plt.xlabel("Epoch")
plt.ylabel("Root Mean Squared Error")
plt.plot(epochs, rmse, label="Loss")
plt.legend()
plt.ylim([rmse.min()*0.97, rmse.max()])
plt.show()
print("Defined the plot_the_model and plot_the_loss_curve functions.")
my_feature = ([1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0, 10.0, 11.0, 12.0])
my_label = ([5.0, 8.8, 9.6, 14.2, 18.8, 19.5, 21.4, 26.8, 28.9, 32.0, 33.8, 38.2])
learning_rate=0.05
epochs=1
my_batch_size=12
my_model = build_model(learning_rate)
trained_weight, trained_bias, epochs, rmse = train_model(my_model, my_feature,
my_label, epochs,
my_batch_size)
plot_the_model(trained_weight, trained_bias, my_feature, my_label)
plot_the_loss_curve(epochs, rmse)
In my specific case my output was:
Now I tried to replicate this in a simple excel sheet and calculated the rmse manually:
However, I get 21.8 and not 23.1? Also my loss is not 535.48, but 476.82
My first question is therefore: Where is my mistake, how is the rmse calculated?
Second question(s): How can I get the rmse for each specific iteration? Let's consider epoch is 4 and batch size is 4.
That gives 4 epochs and 3 batches with each 4 examples (observations). I don't understand how the model is trained with these iterations. So how can I get the coefficients of each regression model and rmse? Not just for each epoch (so 4), but for each iteration. I think each epoch has 3 iterations. So in total I think 12 linear regression models result? I would like to see these 12 models. What are the initial values used in the starting point when no information is given, what kind of slope and intercept is used? The starting at the really first point. I don't specify this. Then I would like to be able follow how the slope and intercepts are adapted at each step. This will be from the gradient descent algorithm I think. But that would be the super plus. More important for me is first to understand how these iterations are done and how they connect to the epoch and batch.
Update: I know that the initial values (for the slope and intercept) are choosen randomly.
Lets consider a linear regression model for a set of samples X
where each sample is represented by one feature x
. As part of model training, we are searching for the line w.x + b
such that ((w.x+b) -y )^2
(squared loss) is minimal. For a set of data points we take mean of squared loss for each sample and so called mean squared error (MSE). The w
and b
which stands for weight and bias are together referred to as weights.
(X^T.X)^-1.X^T.y
A gradient decent algorithm for learning regression looks like blow
w, b = some initial value
While model has not converged:
y_hat = w.X + b
error = MSE(y, y_hat)
back propagate (BPP) error and adjust weights
Each run of the above loop is called an epoch. However due to resource constrains the calculation of y_hat
, error
and BPP is not preformed on full dataset, instead the data is divided into smaller batches and the above operations are performed on one batch at a time. Also we normally fix the number of epoch and monitor if the model has converged.
w, b = some initial value
for i in range(number_of_epochs)
for X_batch,y_batch in get_next_batch(X, y)
y_hat = w.X_batch + b
error = MSE(y_batch, y_hat)
back propagate (BPP) error and adjust weights
Lets say we would like to add root mean squared error for tracing the model performance while it is training. The way Keras implements is as below
w, b = some initial value
for i in range(number_of_epochs)
all_y_hats = []
all_ys = []
for X_batch,y_batch in get_next_batch(X, y)
y_hat = w.X_batch + b
error = MSE(y_batch, y_hat)
all_y_hats.extend(y_hat)
all_ys.extend(y_batch)
batch_rms_error = RMSE(all_ys, all_y_hats)
back propagate (BPP) error and adjust weights
As you can see above, the predictions are accumulated and RMSE is calculated on the accumulated predictions rather then taking the mean of the all previous batch RMSE.
Now that our foundation is clear, lets see how we can implement tracking the same in keras. keras has callbacks, so we can hook into on_batch_begin
callback and accumulate the all_y_hats
and all_ys
. On the on_batch_end
callback keras gives us the calculated RMSE
. We will manually calculate RMSE
using our accumulated all_y_hats
and all_ys
and verify if it is same as what keras calculated. We will also save the weights so that we can later plot the line which is being learned.
import numpy as np
from sklearn.metrics import mean_squared_error
import keras
import matplotlib.pyplot as plt
# Some training data
X = np.arange(16)
y = 0.5*X +0.2
batch_size = 8
all_y_hats = []
learned_weights = []
class CustomCallback(keras.callbacks.Callback):
def on_batch_begin(self, batch, logs={}):
w = self.model.layers[0].weights[0].numpy()[0][0]
b = self.model.layers[0].weights[1].numpy()[0]
s = batch*batch_size
all_y_hats.extend(b + w*X[s:s+batch_size])
learned_weights.append([w,b])
def on_batch_end(self, batch, logs={}):
calculated_error = np.sqrt(mean_squared_error(all_y_hats, y[:len(all_y_hats)]))
print (f"\n Calculated: {calculated_error}, Actual: {logs['root_mean_squared_error']}")
assert np.isclose(calculated_error, logs['root_mean_squared_error'])
def on_epoch_end(self, batch, logs={}):
del all_y_hats[:]
model = keras.models.Sequential()
model.add(keras.layers.Dense(1, input_shape=(1,)))
model.compile(optimizer=keras.optimizers.RMSprop(lr=0.01), loss="mean_squared_error", metrics=[keras.metrics.RootMeanSquaredError()])
# We should set shuffle=False so that we know how baches are divided
history = model.fit(X,y, epochs=100, callbacks=[CustomCallback()], batch_size=batch_size, shuffle=False)
Output:
Epoch 1/100
8/16 [==============>...............] - ETA: 0s - loss: 16.5132 - root_mean_squared_error: 4.0636
Calculated: 4.063645694548688, Actual: 4.063645839691162
Calculated: 8.10112834945773, Actual: 8.101128578186035
16/16 [==============================] - 0s 3ms/step - loss: 65.6283 - root_mean_squared_error: 8.1011
Epoch 2/100
8/16 [==============>...............] - ETA: 0s - loss: 14.0454 - root_mean_squared_error: 3.7477
Calculated: 3.7477213352845675, Actual: 3.7477214336395264
-------------- truncated -----------------------
Ta-da! the assert assert np.isclose(calculated_error, logs['root_mean_squared_error'])
never failed so our calculation/understanding is correct.
Finally, lets plot the line which is being adjusted by the BPP algorithm based on the mean squared error loss. We can use the below code to create a png image of the line being learned at each batch along with the train data.
for i, (w,b) in enumerate(learned_weights):
plt.close()
plt.axis([-1, 18, -1, 10])
plt.scatter(X, y)
plt.plot([-1,17], [-1*w+b, 17*w+b], color='green')
plt.savefig(f'img{i+1}.png')
Below is the gif animation of the above images in the order they are learned.
The hyperplane (line in this case) being learned when y = 0.5*X +5.2
I tried to play with it a little, and I think it is working like this:
So basically I think that intuitively it can be told that first loss is computed, then weights are updated, which means, that weights update is last operation in epoch.
If your model is trained using one epoch and one batch, then what you see on screen is loss computed on initial weights and bias. If you want to see loss and metrics after end of each epoch (with most "actual" weights), you can pass to parameter validation_data=(X,y)
to fit
method. That tells the algorithm to compute loss and metrics once again on this given validation data, when epoch is finished.
Regarding initial weights of model, you can try it when you manually set some initial weights to the layer (using kernel_initializer
parameter):
model.add(tf.keras.layers.Dense(units=1,
input_shape=(1,),
kernel_initializer=tf.constant_initializer(.5)))
Here is the updated part of train_model
function, which shows what I meant:
def train_model(model, feature, label, epochs, batch_size):
"""Train the model by feeding it data."""
# Feed the feature values and the label values to the
# model. The model will train for the specified number
# of epochs, gradually learning how the feature values
# relate to the label values.
init_slope = model.get_weights()[0][0][0]
init_bias = model.get_weights()[1][0]
print('init slope is {}'.format(init_slope))
print('init bias is {}'.format(init_bias))
history = model.fit(x=feature,
y=label,
batch_size=batch_size,
epochs=epochs,
validation_data=(feature,label))
# Gather the trained model's weight and bias.
#print(model.get_weights())
trained_weight = model.get_weights()[0]
trained_bias = model.get_weights()[1]
print("Slope")
print(trained_weight)
print("Intercept")
print(trained_bias)
# The list of epochs is stored separately from the
# rest of history.
prediction_manual = [trained_weight[0][0]*i + trained_bias[0] for i in feature]
manual_loss = np.mean(((np.array(label)-np.array(prediction_manual))**2))
print('manually computed loss after slope and bias update is {}'.format(manual_loss))
print('manually computed rmse after slope and bias update is {}'.format(manual_loss**(1/2)))
prediction_manual_init = [init_slope*i + init_bias for i in feature]
manual_loss_init = np.mean(((np.array(label)-np.array(prediction_manual_init))**2))
print('manually computed loss with init slope and bias is {}'.format(manual_loss_init))
print('manually copmuted loss with init slope and bias is {}'.format(manual_loss_init**(1/2)))
output:
"""
init slope is 0.5
init bias is 0.0
1/1 [==============================] - 0s 117ms/step - loss: 402.9850 - root_mean_squared_error: 20.0745 - val_loss: 352.3351 - val_root_mean_squared_error: 18.7706
Slope
[[0.65811384]]
Intercept
[0.15811387]
manually computed loss after slope and bias update is 352.3350379264957
manually computed rmse after slope and bias update is 18.77058970641295
manually computed loss with init slope and bias is 402.98499999999996
manually copmuted loss with init slope and bias is 20.074486294797182
"""
Note that manually computed loss and metrics after slope and bias update matches to validation loss and metrics and manually computed loss and metrics before update matches the loss and metrics of initial slope and bias.
Regarding second question, I think that you could split your data into batches manually and then iterate over each batch and fit on it. Then, in each iteration, model prints loss and metrics for validation data. Something like this:
init_slope = model.get_weights()[0][0][0]
init_bias = model.get_weights()[1][0]
print('init slope is {}'.format(init_slope))
print('init bias is {}'.format(init_bias))
batch_size = 3
for idx in range(0,len(feature),batch_size):
model.fit(x=feature[idx:idx+batch_size],
y=label[idx:idx+batch_size],
batch_size=1000,
epochs=epochs,
validation_data=(feature,label))
print('slope: {}'.format(model.get_weights()[0][0][0]))
print('intercept: {}'.format(model.get_weights()[1][0]))
print('x data used: {}'.format(feature[idx:idx+batch_size]))
print('y data used: {}'.format(label[idx:idx+batch_size]))
output:
init slope is 0.5
init bias is 0.0
1/1 [==============================] - 0s 117ms/step - loss: 48.9000 - root_mean_squared_error: 6.9929 - val_loss: 352.3351 - val_root_mean_squared_error: 18.7706
slope: 0.6581138372421265
intercept: 0.15811386704444885
x data used: [1.0, 2.0, 3.0]
y data used: [5.0, 8.8, 9.6]
1/1 [==============================] - 0s 21ms/step - loss: 200.9296 - root_mean_squared_error: 14.1750 - val_loss: 306.3082 - val_root_mean_squared_error: 17.5017
slope: 0.8132714033126831
intercept: 0.3018075227737427
x data used: [4.0, 5.0, 6.0]
y data used: [14.2, 18.8, 19.5]
1/1 [==============================] - 0s 22ms/step - loss: 363.2630 - root_mean_squared_error: 19.0595 - val_loss: 266.7119 - val_root_mean_squared_error: 16.3313
slope: 0.9573485255241394
intercept: 0.42669767141342163
x data used: [7.0, 8.0, 9.0]
y data used: [21.4, 26.8, 28.9]
1/1 [==============================] - 0s 22ms/step - loss: 565.5593 - root_mean_squared_error: 23.7815 - val_loss: 232.1553 - val_root_mean_squared_error: 15.2366
slope: 1.0924618244171143
intercept: 0.5409283638000488
x data used: [10.0, 11.0, 12.0]
y data used: [32.0, 33.8, 38.2]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With