Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I take l1 and l2 regularizers into account in tensorflow custom training loops?

While playing with model.fit_on_batch method and custom training loops I realized that in the custom training loop code the loss and gradient do not take into account any l1-l2 regularizers and hence optimizer.apply_gradients method does not take the regularizers into account. Below you can find the code to show this but the idea is pretty simple. So my questions is if there is a method to use all these optimizers in optimizer detail agnostic way to take the regularizers into account. How is it implemented in Keras? On a related note, model.fit_on_batch returns a value that it not the loss (as claimed in the docstring) but something else. I was wondering if someone here knows what it returns.

Code

To see this effect first create some data

x=tf.constant([[1]])
y=tf.constant([[1]])

and create a function to make a reproducible model

def make_model(l1=.01,l2=.01):
    tf.random.set_seed(42)
    np.random.seed(42)
    model=tf.keras.models.Sequential([
        tf.keras.layers.Dense(2,'softmax',
                              use_bias=False,
                              kernel_regularizer=tf.keras.regularizers.l1_l2(l1=l1,l2=l2),
                              input_shape=(1,))
    ])
    return model

Now run Keras train_on_batch

model=make_model()
loss_object=tf.keras.losses.SparseCategoricalCrossentropy()
optimizer=tf.keras.optimizers.RMSprop()
model.compile(loss=loss_object,optimizer=optimizer)
model.train_on_batch(x,y)

and compare the outputs with the custom training loop as explained in the above link as well as here

model=make_model()
loss_object=tf.keras.losses.SparseCategoricalCrossentropy()
optimizer=tf.keras.optimizers.RMSprop()

@tf.function
def train_step(x,y):

    with tf.GradientTape() as tape:
        predictions  = model(x)
        loss = loss_object(y, predictions)

    gradients = tape.gradient(loss, model.trainable_variables)    
    optimizer.apply_gradients(zip(gradients, model.trainable_variables))
    return loss

train_step(x,y).numpy()

You will see the two results are different unless l1==0 and l2==0.

like image 912
Borun Chowdhury Avatar asked Jan 31 '26 05:01

Borun Chowdhury


1 Answers

Actually I found out the answer in Aurelien Geron's book enter image description here

In fact after I implemented the code below, I found that this is covered in the tensorflow guide on custom training (I don't know why its not in the tutorials mentioned in the question since its an important point). The solution in there is more general than the one mentioned here but I am keeping this as it sheds a bit more light on whats happening.

So it is as simple as modifying the custom training loop to

def add_model_regularizer_loss(model):
    loss=0
    for l in model.layers:
        if hasattr(l,'layers') and l.layers: # the layer itself is a model
            loss+=add_model_loss(l)
        if hasattr(l,'kernel_regularizer') and l.kernel_regularizer:
            loss+=l.kernel_regularizer(l.kernel)
        if hasattr(l,'bias_regularizer') and l.bias_regularizer:
            loss+=l.bias_regularizer(l.bias)
    return loss

def train_step(x,y):

    with tf.GradientTape() as tape:
        predictions  = model(x)
        loss = loss_object(y, predictions)
        loss += add_model_regularizer_loss(model)

    gradients = tape.gradient(loss, model.trainable_variables)    
    optimizer.apply_gradients(zip(gradients, model.trainable_variables))
    return loss

To answer the second part of my question, it is this loss value that keras's model fit method returns.

like image 159
Borun Chowdhury Avatar answered Feb 02 '26 21:02

Borun Chowdhury



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!