What is the difference between Model.train_on_batch from keras and Session.run([train_optimizer]) from tensorflow?

Tags:

In the following Keras and Tensorflow implementations of the training of a neural network, how model.train_on_batch([x], [y]) in the keras implementation is different than sess.run([train_optimizer, cross_entropy, accuracy_op], feed_dict=feed_dict) in the Tensorflow implementation? In particular: how those two lines can lead to different computation in training?:

keras_version.py

input_x = Input(shape=input_shape, name="x")
c = Dense(num_classes, activation="softmax")(input_x)

model = Model([input_x], [c])
opt = Adam(lr)
model.compile(loss=['categorical_crossentropy'], optimizer=opt)

nb_batchs = int(len(x_train)/batch_size)

for epoch in range(epochs):
    loss = 0.0
    for batch in range(nb_batchs):
        x = x_train[batch*batch_size:(batch+1)*batch_size]
        y = y_train[batch*batch_size:(batch+1)*batch_size]

        loss_batch, acc_batch = model.train_on_batch([x], [y])

        loss += loss_batch
    print(epoch, loss / nb_batchs)

tensorflow_version.py

input_x = Input(shape=input_shape, name="x")
c = Dense(num_classes)(input_x)

input_y = tf.placeholder(tf.float32, shape=[None, num_classes], name="label")
cross_entropy = tf.reduce_mean(
    tf.nn.softmax_cross_entropy_with_logits_v2(labels=input_y, logits=c, name="xentropy"),
    name="xentropy_mean"
)
train_optimizer = tf.train.AdamOptimizer(learning_rate=lr).minimize(cross_entropy)

nb_batchs = int(len(x_train)/batch_size)

init = tf.global_variables_initializer()
with tf.Session() as sess:
    sess.run(init)
    for epoch in range(epochs):
        loss = 0.0
        acc = 0.0

        for batch in range(nb_batchs):
            x = x_train[batch*batch_size:(batch+1)*batch_size]
            y = y_train[batch*batch_size:(batch+1)*batch_size]

            feed_dict = {input_x: x,
                         input_y: y}
            _, loss_batch = sess.run([train_optimizer, cross_entropy], feed_dict=feed_dict)

            loss += loss_batch
        print(epoch, loss / nb_batchs)

Note: This question follows Same (?) model converges in Keras but not in Tensorflow , which have been considered too broad but in which I show exactly why I think those two statements are somehow different and lead to different computation.

858

asked Nov 20 '18 15:11

LucG

1 Answers

Yes, the results can be different. The results shouldn't be surprising if you know the following things in advance:

Implementation of corss-entropy in Tensorflow and Keras is different. Tensorflow assumes the input to tf.nn.softmax_cross_entropy_with_logits_v2 as the raw unnormalized logits while Keras accepts inputs as probabilities
Implementation of optimizers in Keras and Tensorflow are different.
It might be the case that you are shuffling the data and the batches passed aren't in the same order. Although it doesn't matter if you run the model for long but initial few epochs can be entirely different. Make sure same batch is passed to both and then compare the results.

104

answered Nov 15 '22 17:11

mlRocks

Related questions
                            
                                How to create overlay bar plot in pandas
                            
                                Django Rest Framework custom schema for view in viewset
                            
                                Why does CPython have a "POP_BLOCK" opcode?
                            
                                broadcast views irregularly numpy
                            
                                only algorithm code 1 and 2 are supported
                            
                                How to install Python using the "embeddable zip file"
                            
                                How to patch an asynchronous class method?
                            
                                How to write a pandas dataframe to CSV file line by line, one line at a time?
                            
                                Covering 2D plots with 3D surface in python
                            
                                Installing PyTorch under conda fails with permissions error and Rolling back transaction
                            
                                About unique=True and (unique=True, index=True) in sqlalchemy
                            
                                Plot datetime.time in seaborn
                            
                                Python - Overloading asynchronous methods
                            
                                Plotting numpy array using Seaborn
                            
                                Pandas any() returning false with true values present
                            
                                Dask: Drop NAs on columns?
                            
                                Django shortcut get_object_or_404 inside Django Rest framework Class Based Views
                            
                                How does tensorflow handle non differentiable nodes during gradient calculation?
                            
                                swagger flask restplus, upload a file and take json input together
                            
                                Copying weights of a specific layer - keras

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

What is the difference between Model.train_on_batch from keras and Session.run([train_optimizer]) from tensorflow?

Tags:

python

machine-learning

tensorflow

keras

LucG

People also ask

1 Answers

mlRocks

Recent Activity

Donate For Us