Tensorflow 2.0 Keras is training 4x slower than 2.0 Estimator

Tags:

We recently switched to Keras for TF 2.0, but when we compared it to the DNNClassifier Estimator on 2.0, we experienced around 4x slower speeds with Keras. But I cannot for the life of me figure out why this is happening. The rest of the code for both are identical, using an input_fn() that returns the same tf.data.Dataset, and using identical feature_columns. Been struggling with this problem for days now. Any help would be greatly greatly appreciated. Thank you

Estimator code:

estimator = tf.estimator.DNNClassifier(
        feature_columns = feature_columns,
        hidden_units = [64,64],
        activation_fn = tf.nn.relu,
        optimizer = 'Adagrad',
        dropout = 0.4,
        n_classes = len(vocab),
        model_dir = model_dir,
        batch_norm = false)

estimator.train(input_fn=train_input_fn, steps=400)

Keras code:

feature_layer = tf.keras.layers.DenseFeatures(feature_columns);

model = tf.keras.Sequential([
        feature_layer,
        layers.Dense(64, input_shape = (len(vocab),), activation = tf.nn.relu),
        layers.Dropout(0.4),
        layers.Dense(64, activation = tf.nn.relu),
        layers.Dropout(0.4),
        layers.Dense(len(vocab), activation = 'softmax')]);

model.compile(
        loss = 'sparse_categorical_crossentropy',
        optimizer = 'Adagrad'
        distribute = None)

model.fit(x = train_input_fn(),
          epochs = 1,
          steps_per_epoch = 400,
          shuffle = True)

UPDATE: To test further, I wrote a custom subclassed Model (See: Get Started For Experts), which runs faster than Keras but slower than Estimators. If Estimator trains in 100 secs, the custom model takes approx ~180secs, and Keras approx ~350secs. An interesting note is that Estimator runs slower with Adam() than Adagrad() while Keras seems to run faster. With Adam() Keras takes less than twice as long as DNNClassifier. Assuming I didn't mess up the custom code, I'm beginning to think that DNNClassifier just has a lot of backend optimization / efficiencies that make it run faster than Keras.

Custom code:

class MyModel(Model):
  def __init__(self):
    super(MyModel, self).__init__()
    self.features = layers.DenseFeatures(feature_columns, trainable=False)
    self.dense = layers.Dense(64, activation = 'relu')
    self.dropout = layers.Dropout(0.4)
    self.dense2 = layers.Dense(64, activation = 'relu')
    self.dropout2 = layers.Dropout(0.4)
    self.softmax = layers.Dense(len(vocab_of_codes), activation = 'softmax')

  def call(self, x):
    x = self.features(x)
    x = self.dense(x)
    x = self.dropout(x)
    x = self.dense2(x)
    x = self.dropout2(x)
    return self.softmax(x)

model = MyModel()
loss_object = tf.keras.losses.SparseCategoricalCrossentropy()
optimizer = tf.keras.optimizers.Adagrad()

@tf.function
def train_step(features, label):
  with tf.GradientTape() as tape:
    predictions = model(features)
    loss = loss_object(label, predictions)
  gradients = tape.gradient(loss, model.trainable_variables)
  optimizer.apply_gradients(zip(gradients, model.trainable_variables))

itera = iter(train_input_fn())
for i in range(400):
  features, labels = next(itera)
  train_step(features, labels)

UPDATE: It possibly seems to be the dataset. When I print a row of the dataset within the train_input_fn(), in estimators, it prints out the non-eager Tensor definition. In Keras, it prints out the eager values. Going through the Keras backend code, when it receives a tf.data.dataset as input, it handles it eagerly (and ONLY eagerly), which is why it was crashing whenever I used tf.function on the train_input_fn(). Basically, my guess is DNNClassifier is training faster than Keras because it runs more dataset code in graph mode. Will post any updates/finds.

430

asked Mar 14 '19 20:03

Byest

Video Answer

2 Answers

For those who (like me) find this question and use Keras's Embedding layers:

Even if a GPU is present, but eager execution is enabled, Embedding layers are always placed on the CPU, causing a massive slow-down.

See https://github.com/tensorflow/tensorflow/issues/44194, which also contains a workaround.

187

answered Oct 19 '22 04:10

Till Brychcy

I believe it is slower because it is not being executed on the graph. In order to execute on the graph in TF2 you'll need a function decorated with the tf.function decorator. Check out this section for ideas on how to restructure your code.

answered Oct 19 '22 04:10

DecentGradient

Related questions
                            
                                Pytest KeyError when attempting to access a command line variable
                            
                                Is there any official way to get the admin options of a model?
                            
                                Scipy sparse invert or spsolve lead to UMFPACK_ERROR_OUT_OF_MEMORY
                            
                                Get ordered list of attributes of a Python module
                            
                                Difference in sequence of query generated in Django and Postgres for select_for_update
                            
                                Sending OpenCV output to VLC stream
                            
                                Pandas Design Considerations for MultiIndexed Dataframes
                            
                                setdefault vs defaultdict performance
                            
                                Can I append to a compressed stream with pandas?
                            
                                How to install packages/modules in IronPython
                            
                                PYTHONPATH order on Ubuntu 14.04
                            
                                PyTest-Django Failing on missing django_migration table
                            
                                Does Python have an equivalent to Haskell's 'mask' or 'bracket' functions?
                            
                                Training of keras model get's slower after each repetition
                            
                                Computing the "closure" of the attributes of an object given functions that change the attributes
                            
                                Is there a way to use tensorflow map_fn on GPU?
                            
                                Keras custom loss implementation : ValueError: An operation has `None` for gradient
                            
                                Jenkins Job - DatabaseError: file is encrypted or is not a database
                            
                                install caffe on mac " Error: invalid option: --with-python"
                            
                                Missing elements when using selenium chrome driver to automatically 'Save as PDF'

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Tensorflow 2.0 Keras is training 4x slower than 2.0 Estimator

Tags:

python

tensorflow

keras

tensorflow2.0

tensorflow-estimator