How to overfit data with Keras?

Question

I'm trying to build a simple regression model using keras and tensorflow. In my problem I have data in the form (x, y), where x and y are simply numbers. I'd like to build a keras model in order to predict y using x as an input.

Since I think images better explains thing, these are my data:

enter image description here

We may discuss if they are good or not, but in my problem I cannot really cheat them.

My keras model is the following (data are splitted 30% test (X_test, y_test) and 70% training (X_train, y_train)):

model = tf.keras.Sequential()

model.add(tf.keras.layers.Dense(32, input_shape=() activation="relu", name="first_layer"))
model.add(tf.keras.layers.Dense(16, activation="relu", name="second_layer"))
model.add(tf.keras.layers.Dense(1, name="output_layer"))

model.compile(loss = "mean_squared_error", optimizer = "adam", metrics=["mse"] )

history = model.fit(X_train, y_train, epochs=500, batch_size=1, verbose=0, shuffle=False) 
eval_result = model.evaluate(X_test, y_test)
print("

Test loss:", eval_result, "
")

predict_Y = model.predict(X)

note: X contains both X_test and X_train.

Plotting the prediction I get (blue squares are the prediction predict_Y)

enter image description here

I'm playing a lot with layers, activation funztions and other parameters. My goal is to find the best parameters to train the model, but the actual question, here, is slightly different: in fact I have hard times to force the model to overfit the data (as you can see from the above results).

Does anyone have some sort of idea about how to reproduce overfitting?

This is the outcome I would like to get: enter image description here

(red dots are under blue squares!)

EDIT:

Here I provide you the data used in the example above: you can copy paste directly to a python interpreter:

X_train = [0.704619794270697, 0.6779457393024553, 0.8207082120250023, 0.8588819357831449, 0.8692320257603844, 0.6878750931810429, 0.9556331888763945, 0.77677964510883, 0.7211381534179618, 0.6438319113259414, 0.6478339581502052, 0.9710222750072649, 0.8952188423349681, 0.6303124926673513, 0.9640316662124185, 0.869691568491902, 0.8320164648420931, 0.8236399177660375, 0.8877334038470911, 0.8084042532069621, 0.8045680821762038]
y_train = [0.7766424210611557, 0.8210846773655833, 0.9996114311913593, 0.8041331063189883, 0.9980525368790883, 0.8164056182686034, 0.8925487603333683, 0.7758207470960685, 0.37345286573743475, 0.9325789202459493, 0.6060269037514895, 0.9319771743389491, 0.9990691225991941, 0.9320002808310418, 0.9992560731072977, 0.9980241561997089, 0.8882905258641204, 0.4678339275898943, 0.9312152374846061, 0.9542371205095945, 0.8885893668675711]
X_test = [0.9749191829308574, 0.8735366740730178, 0.8882783211709133, 0.8022891400991644, 0.8650601322313454, 0.8697902997857514, 1.0, 0.8165876695985228, 0.8923841531760973]
y_test = [0.975653685270635, 0.9096752789481569, 0.6653736469114154, 0.46367666660348744, 0.9991817903431941, 1.0, 0.9111205717076893, 0.5264993912088891, 0.9989199241685126]
X = [0.704619794270697, 0.77677964510883, 0.7211381534179618, 0.6478339581502052, 0.6779457393024553, 0.8588819357831449, 0.8045680821762038, 0.8320164648420931, 0.8650601322313454, 0.8697902997857514, 0.8236399177660375, 0.6878750931810429, 0.8923841531760973, 0.8692320257603844, 0.8877334038470911, 0.8735366740730178, 0.8207082120250023, 0.8022891400991644, 0.6303124926673513, 0.8084042532069621, 0.869691568491902, 0.9710222750072649, 0.9556331888763945, 0.8882783211709133, 0.8165876695985228, 0.6438319113259414, 0.8952188423349681, 0.9749191829308574, 1.0, 0.9640316662124185]
Y = [0.7766424210611557, 0.7758207470960685, 0.37345286573743475, 0.6060269037514895, 0.8210846773655833, 0.8041331063189883, 0.8885893668675711, 0.8882905258641204, 0.9991817903431941, 1.0, 0.4678339275898943, 0.8164056182686034, 0.9989199241685126, 0.9980525368790883, 0.9312152374846061, 0.9096752789481569, 0.9996114311913593, 0.46367666660348744, 0.9320002808310418, 0.9542371205095945, 0.9980241561997089, 0.9319771743389491, 0.8925487603333683, 0.6653736469114154, 0.5264993912088891, 0.9325789202459493, 0.9990691225991941, 0.975653685270635, 0.9111205717076893, 0.9992560731072977]

Where X contains the list of the x values and Y the corresponding y value. (X_test, y_test) and (X_train, y_train) are two (non overlapping) subset of (X, Y).

To predict and show the model results I simply use matplotlib (imported as plt):

predict_Y = model.predict(X)
plt.plot(X, Y, "ro", X, predict_Y, "bs")
plt.show()

Ankur · Accepted Answer

Overfitted models are rarely useful in real life. It appears to me that OP is well aware of that but wants to see if NNs are indeed capable of fitting (bounded) arbitrary functions or not. On one hand, the input-output data in the example seems to obey no discernible pattern. On the other hand, both input and output are scalars in [0, 1] and there are only 21 data points in the training set.

Based on my experiments and results, we can indeed overfit as requested. See the image below.

enter image description here

Numerical results:

           x    y_true    y_pred     error
0   0.704620  0.776642  0.773753 -0.002889
1   0.677946  0.821085  0.819597 -0.001488
2   0.820708  0.999611  0.999813  0.000202
3   0.858882  0.804133  0.805160  0.001026
4   0.869232  0.998053  0.997862 -0.000190
5   0.687875  0.816406  0.814692 -0.001714
6   0.955633  0.892549  0.893117  0.000569
7   0.776780  0.775821  0.779289  0.003469
8   0.721138  0.373453  0.374007  0.000554
9   0.643832  0.932579  0.912565 -0.020014
10  0.647834  0.606027  0.607253  0.001226
11  0.971022  0.931977  0.931549 -0.000428
12  0.895219  0.999069  0.999051 -0.000018
13  0.630312  0.932000  0.930252 -0.001748
14  0.964032  0.999256  0.999204 -0.000052
15  0.869692  0.998024  0.997859 -0.000165
16  0.832016  0.888291  0.887883 -0.000407
17  0.823640  0.467834  0.460728 -0.007106
18  0.887733  0.931215  0.932790  0.001575
19  0.808404  0.954237  0.960282  0.006045
20  0.804568  0.888589  0.906829  0.018240
{'me': -0.00015776709314323828, 
 'mae': 0.00329163070145315, 
 'mse': 4.0713782563067185e-05, 
 'rmse': 0.006380735268216915}

OP's code seems good to me. My changes were minor:

Use deeper networks. It may not actually be necessary to use a depth of 30 layers but since we just want to overfit, I didn't experiment too much with what's the minimum depth needed.
Each Dense layer has 50 units. Again, this may be overkill.
Added batch normalization layer every 5th dense layer.
Decreased learning rate by half.
Ran optimization for longer using the all 21 training examples in a batch.
Used MAE as objective function. MSE is good but since we want to overfit, I want to penalize small errors the same way as large errors.
Random numbers are more important here because data appears to be arbitrary. Though, you should get similar results if you change random number seed and let the optimizer run long enough. In some cases, optimization does get stuck in a local minima and it would not produce overfitting (as requested by OP).

The code is below.

import numpy as np
import pandas as pd
import tensorflow as tf
from tensorflow.keras.layers import Input, Dense, BatchNormalization
from tensorflow.keras.models import Model
from tensorflow.keras.optimizers import Adam
import matplotlib.pyplot as plt

# Set seed just to have reproducible results
np.random.seed(84)
tf.random.set_seed(84)

# Load data from the post
# https://stackoverflow.com/questions/61252785/how-to-overfit-data-with-keras
X_train = np.array([0.704619794270697, 0.6779457393024553, 0.8207082120250023,
                    0.8588819357831449, 0.8692320257603844, 0.6878750931810429,
                    0.9556331888763945, 0.77677964510883, 0.7211381534179618,
                    0.6438319113259414, 0.6478339581502052, 0.9710222750072649,
                    0.8952188423349681, 0.6303124926673513, 0.9640316662124185,
                    0.869691568491902, 0.8320164648420931, 0.8236399177660375,
                    0.8877334038470911, 0.8084042532069621,
                    0.8045680821762038])
Y_train = np.array([0.7766424210611557, 0.8210846773655833, 0.9996114311913593,
                    0.8041331063189883, 0.9980525368790883, 0.8164056182686034,
                    0.8925487603333683, 0.7758207470960685,
                    0.37345286573743475, 0.9325789202459493,
                    0.6060269037514895, 0.9319771743389491, 0.9990691225991941,
                    0.9320002808310418, 0.9992560731072977, 0.9980241561997089,
                    0.8882905258641204, 0.4678339275898943, 0.9312152374846061,
                    0.9542371205095945, 0.8885893668675711])
X_test = np.array([0.9749191829308574, 0.8735366740730178, 0.8882783211709133,
                   0.8022891400991644, 0.8650601322313454, 0.8697902997857514,
                   1.0, 0.8165876695985228, 0.8923841531760973])
Y_test = np.array([0.975653685270635, 0.9096752789481569, 0.6653736469114154,
                   0.46367666660348744, 0.9991817903431941, 1.0,
                   0.9111205717076893, 0.5264993912088891, 0.9989199241685126])
X = np.array([0.704619794270697, 0.77677964510883, 0.7211381534179618,
              0.6478339581502052, 0.6779457393024553, 0.8588819357831449,
              0.8045680821762038, 0.8320164648420931, 0.8650601322313454,
              0.8697902997857514, 0.8236399177660375, 0.6878750931810429,
              0.8923841531760973, 0.8692320257603844, 0.8877334038470911,
              0.8735366740730178, 0.8207082120250023, 0.8022891400991644,
              0.6303124926673513, 0.8084042532069621, 0.869691568491902,
              0.9710222750072649, 0.9556331888763945, 0.8882783211709133,
              0.8165876695985228, 0.6438319113259414, 0.8952188423349681,
              0.9749191829308574, 1.0, 0.9640316662124185])
Y = np.array([0.7766424210611557, 0.7758207470960685, 0.37345286573743475,
              0.6060269037514895, 0.8210846773655833, 0.8041331063189883,
              0.8885893668675711, 0.8882905258641204, 0.9991817903431941, 1.0,
              0.4678339275898943, 0.8164056182686034, 0.9989199241685126,
              0.9980525368790883, 0.9312152374846061, 0.9096752789481569,
              0.9996114311913593, 0.46367666660348744, 0.9320002808310418,
              0.9542371205095945, 0.9980241561997089, 0.9319771743389491,
              0.8925487603333683, 0.6653736469114154, 0.5264993912088891,
              0.9325789202459493, 0.9990691225991941, 0.975653685270635,
              0.9111205717076893, 0.9992560731072977])

# Reshape all data to be of the shape (batch_size, 1)
X_train = X_train.reshape((-1, 1))
Y_train = Y_train.reshape((-1, 1))
X_test = X_test.reshape((-1, 1))
Y_test = Y_test.reshape((-1, 1))
X = X.reshape((-1, 1))
Y = Y.reshape((-1, 1))

# Is data scaled? NNs do well with bounded data.
assert np.all(X_train >= 0) and np.all(X_train <= 1)
assert np.all(Y_train >= 0) and np.all(Y_train <= 1)
assert np.all(X_test >= 0) and np.all(X_test <= 1)
assert np.all(Y_test >= 0) and np.all(Y_test <= 1)
assert np.all(X >= 0) and np.all(X <= 1)
assert np.all(Y >= 0) and np.all(Y <= 1)

# Build a model with variable number of hidden layers.
# We will use Keras functional API.
# https://www.perfectlyrandom.org/2019/06/24/a-guide-to-keras-functional-api/
n_dense_layers = 30  # increase this to get more complicated models

# Define the layers first.
input_tensor = Input(shape=(1,), name='input')
layers = []
for i in range(n_dense_layers):
    layers += [Dense(units=50, activation='relu', name=f'dense_layer_{i}')]
    if (i > 0) & (i % 5 == 0):
        # avg over batches not features
        layers += [BatchNormalization(axis=1)]
sigmoid_layer = Dense(units=1, activation='sigmoid', name='sigmoid_layer')

# Connect the layers using Keras Functional API
mid_layer = input_tensor
for dense_layer in layers:
    mid_layer = dense_layer(mid_layer)
output_tensor = sigmoid_layer(mid_layer)
model = Model(inputs=[input_tensor], outputs=[output_tensor])
optimizer = Adam(learning_rate=0.0005)
model.compile(optimizer=optimizer, loss='mae', metrics=['mae'])
model.fit(x=[X_train], y=[Y_train], epochs=40000, batch_size=21)

# Predict on various datasets
Y_train_pred = model.predict(X_train)

# Create a dataframe to inspect results manually
train_df = pd.DataFrame({
    'x': X_train.reshape((-1)),
    'y_true': Y_train.reshape((-1)),
    'y_pred': Y_train_pred.reshape((-1))
})
train_df['error'] = train_df['y_pred'] - train_df['y_true']
print(train_df)

# A dictionary to store all the errors in one place.
train_errors = {
    'me': np.mean(train_df['error']),
    'mae': np.mean(np.abs(train_df['error'])),
    'mse': np.mean(np.square(train_df['error'])),
    'rmse': np.sqrt(np.mean(np.square(train_df['error']))),
}
print(train_errors)

# Make a plot to visualize true vs predicted
plt.figure(1)
plt.clf()
plt.plot(train_df['x'], train_df['y_true'], 'r.', label='y_true')
plt.plot(train_df['x'], train_df['y_pred'], 'bo', alpha=0.25, label='y_pred')
plt.grid(True)
plt.xlabel('x')
plt.ylabel('y')
plt.title(f'Train data. MSE={np.round(train_errors["mse"], 5)}.')
plt.legend()
plt.show(block=False)
plt.savefig('true_vs_pred.png')

How to overfit data with Keras?

Tags:

machine-learning

neural-network

keras

tf.keras

Gigino

1 Answers

Ankur

Recent Activity

Donate For Us

How to overfit data with Keras?

Tags:

machine-learning

neural-network

keras

tf.keras

Gigino

1 Answers

Ankur

Related questions

Recent Activity

Donate For Us