Problems understanding linear regression model tuning in tf.keras

Question

I am working on the Linear Regression with Synthetic Data Colab exercise, which explores linear regression with a toy dataset. There is a linear regression model built and trained and one can play around with the learning rate, the epoch and the batch size. I have troubles understanding how exactly the iterations are done and how this connects to the "epoch" and the "batch size". I am basically not getting how the actual model is trained, how data is processed and iterations are done. To understand this I wanted to follow this by calculating each step manually. Therefore I wanted to have the slope and intercept coefficient for each step. So that I can see what kind of data the "computer" uses, puts into the model, what kind of model results at each specific iteration and how iterations are done. I tried first to get the slope and intercept for each single step, however failed, because only at the end the slope and intercept is outputted. My modified code (original, just added:)

  print("Slope")
  print(trained_weight)
  print("Intercept")
  print(trained_bias)

code:

import pandas as pd
import tensorflow as tf
from matplotlib import pyplot as plt

#@title Define the functions that build and train a model
def build_model(my_learning_rate):
  """Create and compile a simple linear regression model."""
  # Most simple tf.keras models are sequential. 
  # A sequential model contains one or more layers.
  model = tf.keras.models.Sequential()

  # Describe the topography of the model.
  # The topography of a simple linear regression model
  # is a single node in a single layer. 
  model.add(tf.keras.layers.Dense(units=1, 
                                  input_shape=(1,)))

  # Compile the model topography into code that 
  # TensorFlow can efficiently execute. Configure 
  # training to minimize the model's mean squared error. 
  model.compile(optimizer=tf.keras.optimizers.RMSprop(lr=my_learning_rate),
                loss="mean_squared_error",
                metrics=[tf.keras.metrics.RootMeanSquaredError()])
 
  return model           


def train_model(model, feature, label, epochs, batch_size):
  """Train the model by feeding it data."""

  # Feed the feature values and the label values to the 
  # model. The model will train for the specified number 
  # of epochs, gradually learning how the feature values
  # relate to the label values. 
  history = model.fit(x=feature,
                      y=label,
                      batch_size=batch_size,
                      epochs=epochs)

  # Gather the trained model's weight and bias.
  trained_weight = model.get_weights()[0]
  trained_bias = model.get_weights()[1]
  print("Slope")
  print(trained_weight)
  print("Intercept")
  print(trained_bias)
  # The list of epochs is stored separately from the 
  # rest of history.
  epochs = history.epoch

  # Gather the history (a snapshot) of each epoch.
  hist = pd.DataFrame(history.history)

 # print(hist)
  # Specifically gather the model's root mean 
  #squared error at each epoch. 
  rmse = hist["root_mean_squared_error"]

  return trained_weight, trained_bias, epochs, rmse

print("Defined create_model and train_model")

#@title Define the plotting functions
def plot_the_model(trained_weight, trained_bias, feature, label):
  """Plot the trained model against the training feature and label."""

  # Label the axes.
  plt.xlabel("feature")
  plt.ylabel("label")

  # Plot the feature values vs. label values.
  plt.scatter(feature, label)

  # Create a red line representing the model. The red line starts
  # at coordinates (x0, y0) and ends at coordinates (x1, y1).
  x0 = 0
  y0 = trained_bias
  x1 = my_feature[-1]
  y1 = trained_bias + (trained_weight * x1)
  plt.plot([x0, x1], [y0, y1], c='r')

  # Render the scatter plot and the red line.
  plt.show()

def plot_the_loss_curve(epochs, rmse):
  """Plot the loss curve, which shows loss vs. epoch."""

  plt.figure()
  plt.xlabel("Epoch")
  plt.ylabel("Root Mean Squared Error")

  plt.plot(epochs, rmse, label="Loss")
  plt.legend()
  plt.ylim([rmse.min()*0.97, rmse.max()])
  plt.show()

print("Defined the plot_the_model and plot_the_loss_curve functions.")

my_feature = ([1.0, 2.0,  3.0,  4.0,  5.0,  6.0,  7.0,  8.0,  9.0, 10.0, 11.0, 12.0])
my_label   = ([5.0, 8.8,  9.6, 14.2, 18.8, 19.5, 21.4, 26.8, 28.9, 32.0, 33.8, 38.2])

learning_rate=0.05
epochs=1
my_batch_size=12

my_model = build_model(learning_rate)
trained_weight, trained_bias, epochs, rmse = train_model(my_model, my_feature, 
                                                         my_label, epochs,
                                                         my_batch_size)
plot_the_model(trained_weight, trained_bias, my_feature, my_label)
plot_the_loss_curve(epochs, rmse)

In my specific case my output was:

ex1

Now I tried to replicate this in a simple excel sheet and calculated the rmse manually:

eso

However, I get 21.8 and not 23.1? Also my loss is not 535.48, but 476.82

My first question is therefore: Where is my mistake, how is the rmse calculated?

Second question(s): How can I get the rmse for each specific iteration? Let's consider epoch is 4 and batch size is 4.

exam

That gives 4 epochs and 3 batches with each 4 examples (observations). I don't understand how the model is trained with these iterations. So how can I get the coefficients of each regression model and rmse? Not just for each epoch (so 4), but for each iteration. I think each epoch has 3 iterations. So in total I think 12 linear regression models result? I would like to see these 12 models. What are the initial values used in the starting point when no information is given, what kind of slope and intercept is used? The starting at the really first point. I don't specify this. Then I would like to be able follow how the slope and intercepts are adapted at each step. This will be from the gradient descent algorithm I think. But that would be the super plus. More important for me is first to understand how these iterations are done and how they connect to the epoch and batch.

Update: I know that the initial values (for the slope and intercept) are choosen randomly.

mujjiga · Accepted Answer

Foundation

Problem statement

Lets consider a linear regression model for a set of samples X where each sample is represented by one feature x. As part of model training, we are searching for the line w.x + b such that ((w.x+b) -y )^2 (squared loss) is minimal. For a set of data points we take mean of squared loss for each sample and so called mean squared error (MSE). The w and b which stands for weight and bias are together referred to as weights.

Fitting the line/Training the model

We have a closed form solution for solving the linear regression problem and is (X^T.X)^-1.X^T.y
We can also use gradient decent method to search for weights which minimize the squared loss. The frameworks like tensorflow, pytorch use gradient decent to search the weights (called training).

Gradient decent

A gradient decent algorithm for learning regression looks like blow

w, b = some initial value
While model has not converged:
    y_hat = w.X + b
    error = MSE(y, y_hat) 
    back propagate (BPP) error and adjust weights

Each run of the above loop is called an epoch. However due to resource constrains the calculation of y_hat, error and BPP is not preformed on full dataset, instead the data is divided into smaller batches and the above operations are performed on one batch at a time. Also we normally fix the number of epoch and monitor if the model has converged.

w, b = some initial value
for i in range(number_of_epochs)
    for X_batch,y_batch in get_next_batch(X, y)
        y_hat = w.X_batch + b
        error = MSE(y_batch, y_hat) 
    back propagate (BPP) error and adjust weights

Keras implementation of batches

Lets say we would like to add root mean squared error for tracing the model performance while it is training. The way Keras implements is as below

w, b = some initial value
for i in range(number_of_epochs)
    all_y_hats = []
    all_ys = []
    for X_batch,y_batch in get_next_batch(X, y)
        y_hat = w.X_batch + b
        error = MSE(y_batch, y_hat)

        all_y_hats.extend(y_hat) 
        all_ys.extend(y_batch)

        batch_rms_error = RMSE(all_ys, all_y_hats)

    back propagate (BPP) error and adjust weights

As you can see above, the predictions are accumulated and RMSE is calculated on the accumulated predictions rather then taking the mean of the all previous batch RMSE.

Implementation in keras

Now that our foundation is clear, lets see how we can implement tracking the same in keras. keras has callbacks, so we can hook into on_batch_begin callback and accumulate the all_y_hats and all_ys. On the on_batch_end callback keras gives us the calculated RMSE. We will manually calculate RMSE using our accumulated all_y_hats and all_ys and verify if it is same as what keras calculated. We will also save the weights so that we can later plot the line which is being learned.

import numpy as np
from sklearn.metrics import mean_squared_error
import keras
import matplotlib.pyplot as plt

# Some training data
X = np.arange(16)
y = 0.5*X +0.2

batch_size = 8
all_y_hats = []
learned_weights = [] 

class CustomCallback(keras.callbacks.Callback):
  def on_batch_begin(self, batch, logs={}):    
    w = self.model.layers[0].weights[0].numpy()[0][0]
    b = self.model.layers[0].weights[1].numpy()[0]    
    s = batch*batch_size
    all_y_hats.extend(b + w*X[s:s+batch_size])    
    learned_weights.append([w,b])

  def on_batch_end(self, batch, logs={}):    
    calculated_error = np.sqrt(mean_squared_error(all_y_hats, y[:len(all_y_hats)]))
    print (f"
 Calculated: {calculated_error},  Actual: {logs['root_mean_squared_error']}")
    assert np.isclose(calculated_error, logs['root_mean_squared_error'])

  def on_epoch_end(self, batch, logs={}):
    del all_y_hats[:]    


model = keras.models.Sequential()
model.add(keras.layers.Dense(1, input_shape=(1,)))
model.compile(optimizer=keras.optimizers.RMSprop(lr=0.01), loss="mean_squared_error",  metrics=[keras.metrics.RootMeanSquaredError()])
# We should set shuffle=False so that we know how baches are divided
history = model.fit(X,y, epochs=100, callbacks=[CustomCallback()], batch_size=batch_size, shuffle=False)

Output:

Epoch 1/100
 8/16 [==============>...............] - ETA: 0s - loss: 16.5132 - root_mean_squared_error: 4.0636
 Calculated: 4.063645694548688,  Actual: 4.063645839691162

 Calculated: 8.10112834945773,  Actual: 8.101128578186035
16/16 [==============================] - 0s 3ms/step - loss: 65.6283 - root_mean_squared_error: 8.1011
Epoch 2/100
 8/16 [==============>...............] - ETA: 0s - loss: 14.0454 - root_mean_squared_error: 3.7477
 Calculated: 3.7477213352845675,  Actual: 3.7477214336395264
-------------- truncated -----------------------

Ta-da! the assert assert np.isclose(calculated_error, logs['root_mean_squared_error']) never failed so our calculation/understanding is correct.

The line

Finally, lets plot the line which is being adjusted by the BPP algorithm based on the mean squared error loss. We can use the below code to create a png image of the line being learned at each batch along with the train data.

for i, (w,b) in enumerate(learned_weights):
  plt.close()
  plt.axis([-1, 18, -1, 10])
  plt.scatter(X, y)
  plt.plot([-1,17], [-1*w+b, 17*w+b], color='green')
  plt.savefig(f'img{i+1}.png')

Below is the gif animation of the above images in the order they are learned.

enter image description here

The hyperplane (line in this case) being learned when y = 0.5*X +5.2

enter image description here

Jan Musil · Answer

I tried to play with it a little, and I think it is working like this:

weights (usually random, depending on settings) for each feature are initialized. Also bias, which is initially 0.0 is initiated.
loss and metrics for first batch are computed and printed and weights and bias are updated.
step 2. is repeated for all batches in epoch, however, after last batch loss and metrics are not printed, so what you see on screen are loss and metrics before last update in the epoch.
new epoch is started and first metrics and loss you see printed, are actually those one computed on last updated weights from previous epoch...

So basically I think that intuitively it can be told that first loss is computed, then weights are updated, which means, that weights update is last operation in epoch.

If your model is trained using one epoch and one batch, then what you see on screen is loss computed on initial weights and bias. If you want to see loss and metrics after end of each epoch (with most "actual" weights), you can pass to parameter validation_data=(X,y) to fit method. That tells the algorithm to compute loss and metrics once again on this given validation data, when epoch is finished.

Regarding initial weights of model, you can try it when you manually set some initial weights to the layer (using kernel_initializer parameter):

  model.add(tf.keras.layers.Dense(units=1,
                                  input_shape=(1,),
                                  kernel_initializer=tf.constant_initializer(.5)))

Here is the updated part of train_model function, which shows what I meant:

  def train_model(model, feature, label, epochs, batch_size):
        """Train the model by feeding it data."""

        # Feed the feature values and the label values to the
        # model. The model will train for the specified number
        # of epochs, gradually learning how the feature values
        # relate to the label values.
        init_slope = model.get_weights()[0][0][0]
        init_bias = model.get_weights()[1][0]
        print('init slope is {}'.format(init_slope))
        print('init bias is {}'.format(init_bias))

        history = model.fit(x=feature,
                          y=label,
                          batch_size=batch_size,
                          epochs=epochs,
                          validation_data=(feature,label))

        # Gather the trained model's weight and bias.
        #print(model.get_weights())
        trained_weight = model.get_weights()[0]
        trained_bias = model.get_weights()[1]
        print("Slope")
        print(trained_weight)
        print("Intercept")
        print(trained_bias)
        # The list of epochs is stored separately from the
        # rest of history.
        prediction_manual = [trained_weight[0][0]*i + trained_bias[0] for i in feature]

        manual_loss = np.mean(((np.array(label)-np.array(prediction_manual))**2))
        print('manually computed loss after slope and bias update is {}'.format(manual_loss))
        print('manually computed rmse after slope and bias update is {}'.format(manual_loss**(1/2)))

        prediction_manual_init = [init_slope*i + init_bias for i in feature]
        manual_loss_init = np.mean(((np.array(label)-np.array(prediction_manual_init))**2))
        print('manually computed loss with init slope and bias is {}'.format(manual_loss_init))
        print('manually copmuted loss with init slope and bias is {}'.format(manual_loss_init**(1/2)))

output:

"""
init slope is 0.5
init bias is 0.0
1/1 [==============================] - 0s 117ms/step - loss: 402.9850 - root_mean_squared_error: 20.0745 - val_loss: 352.3351 - val_root_mean_squared_error: 18.7706
Slope
[[0.65811384]]
Intercept
[0.15811387]
manually computed loss after slope and bias update is 352.3350379264957
manually computed rmse after slope and bias update is 18.77058970641295
manually computed loss with init slope and bias is 402.98499999999996
manually copmuted loss with init slope and bias is 20.074486294797182
"""

Note that manually computed loss and metrics after slope and bias update matches to validation loss and metrics and manually computed loss and metrics before update matches the loss and metrics of initial slope and bias.

Regarding second question, I think that you could split your data into batches manually and then iterate over each batch and fit on it. Then, in each iteration, model prints loss and metrics for validation data. Something like this:

  init_slope = model.get_weights()[0][0][0]
  init_bias = model.get_weights()[1][0]
  print('init slope is {}'.format(init_slope))
  print('init bias is {}'.format(init_bias))
  batch_size = 3

  for idx in range(0,len(feature),batch_size):
      model.fit(x=feature[idx:idx+batch_size],
                y=label[idx:idx+batch_size],
                batch_size=1000,
                epochs=epochs,
                validation_data=(feature,label))
      print('slope: {}'.format(model.get_weights()[0][0][0]))
      print('intercept: {}'.format(model.get_weights()[1][0]))
      print('x data used: {}'.format(feature[idx:idx+batch_size]))
      print('y data used: {}'.format(label[idx:idx+batch_size]))

output:

init slope is 0.5
init bias is 0.0
1/1 [==============================] - 0s 117ms/step - loss: 48.9000 - root_mean_squared_error: 6.9929 - val_loss: 352.3351 - val_root_mean_squared_error: 18.7706
slope: 0.6581138372421265
intercept: 0.15811386704444885
x data used: [1.0, 2.0, 3.0]
y data used: [5.0, 8.8, 9.6]
1/1 [==============================] - 0s 21ms/step - loss: 200.9296 - root_mean_squared_error: 14.1750 - val_loss: 306.3082 - val_root_mean_squared_error: 17.5017
slope: 0.8132714033126831
intercept: 0.3018075227737427
x data used: [4.0, 5.0, 6.0]
y data used: [14.2, 18.8, 19.5]
1/1 [==============================] - 0s 22ms/step - loss: 363.2630 - root_mean_squared_error: 19.0595 - val_loss: 266.7119 - val_root_mean_squared_error: 16.3313
slope: 0.9573485255241394
intercept: 0.42669767141342163
x data used: [7.0, 8.0, 9.0]
y data used: [21.4, 26.8, 28.9]
1/1 [==============================] - 0s 22ms/step - loss: 565.5593 - root_mean_squared_error: 23.7815 - val_loss: 232.1553 - val_root_mean_squared_error: 15.2366
slope: 1.0924618244171143
intercept: 0.5409283638000488
x data used: [10.0, 11.0, 12.0]
y data used: [32.0, 33.8, 38.2]

Problems understanding linear regression model tuning in tf.keras

Tags:

python

matplotlib

tensorflow

keras

regression

Stat Tistician

2 Answers

Foundation

Problem statement

Fitting the line/Training the model

Gradient decent

Keras implementation of batches

Implementation in keras

The line

mujjiga

Jan Musil

Recent Activity

Donate For Us

Problems understanding linear regression model tuning in tf.keras

Tags:

python

matplotlib

tensorflow

keras

regression

Stat Tistician

2 Answers

Foundation

Problem statement

Fitting the line/Training the model

Gradient decent

Keras implementation of batches

Implementation in keras

The line

mujjiga

Jan Musil

Related questions

Recent Activity

Donate For Us