InternalError when using TPU for training Keras model

Question

I am attempting to fine-tune a BERT model on Google Colab from the Tensorflow Hub using this link.

However, I run into the following error:

InternalError: RET_CHECK failure (third_party/tensorflow/core/tpu/graph_rewrite/distributed_tpu_rewrite_pass.cc:2047) arg_shape.handle_type != DT_INVALID  input edge: [id=2693 model_preprocessing_67660:0 -> cluster_train_function:628]

When I run my model.fit(...) function.

This error only occurs when I try to use TPU (runs fine on CPU, but has a very long training time).

Here is my code for setting up the TPU and model:

TPU Setup:

import os
os.environ["TFHUB_MODEL_LOAD_FORMAT"]="UNCOMPRESSED"

cluster_resolver = tf.distribute.cluster_resolver.TPUClusterResolver(tpu='')
tf.config.experimental_connect_to_cluster(cluster_resolver)
tf.tpu.experimental.initialize_tpu_system(cluster_resolver)
strategy = tf.distribute.TPUStrategy(cluster_resolver)

Model Setup:

def build_classifier_model():
  text_input = tf.keras.layers.Input(shape=(), dtype=tf.string, name='text')
  preprocessing_layer = hub.KerasLayer('https://tfhub.dev/tensorflow/bert_en_uncased_preprocess/3', name='preprocessing')
  encoder_inputs = preprocessing_layer(text_input)
  encoder = hub.KerasLayer('https://tfhub.dev/google/experts/bert/wiki_books/sst2/2', trainable=True, name='BERT_encoder')
  outputs = encoder(encoder_inputs)
  net = outputs['pooled_output']
  net = tf.keras.layers.Dropout(0.1)(net)
  net = tf.keras.layers.Dense(1, activation=None, name='classifier')(net)
  return tf.keras.Model(text_input, net)

Model Training

with strategy.scope():

  bert_model = build_classifier_model()
  loss = tf.keras.losses.BinaryCrossentropy(from_logits=True)
  metrics = tf.metrics.BinaryAccuracy()
  epochs = 1
  steps_per_epoch = 1280000
  num_train_steps = steps_per_epoch * epochs
  num_warmup_steps = int(0.1*num_train_steps)

  init_lr = 3e-5
  optimizer = optimization.create_optimizer(init_lr=init_lr,
                                          num_train_steps=num_train_steps,
                                          num_warmup_steps=num_warmup_steps,
                                          optimizer_type='adamw')
  bert_model.compile(optimizer=optimizer,
                         loss=loss,
                         metrics=metrics)
  print(f'Training model')
  history = bert_model.fit(x=X_train, y=y_train,
                               validation_data=(X_val, y_val),
                               epochs=epochs)

Note that X_train is a numpy array of type str with shape (1280000,) and y_train is a numpy array of shape (1280000, 1)

Chinmay · Accepted Answer

As I don't exactly know what changes you have made in the code... I don't have idea about your dataset. But I can see that you are trying to train the whole datset with one epoch and passing the steps per epoch directly. I would recommend to write it like this

set some batch_size 2^n power (for example 16 or 32 or etc) if you don't want to batch the dataset just set batch_size to 1

batch_size = 16
steps_per_epoch = training_data_size // batch_size

The problem with the code is most probably the training dataset size. I think that you're making a mistake by passing the value of the training dataset manually.

If you're loading the dataset from tfds use (as shown in the link):

train_dataset, train_data_size = load_dataset_from_tfds(
  in_memory_ds, tfds_info, train_split, batch_size, bert_preprocess_model)

If you're using a custom dataset take the size of the cleaned dataset in a variable and then use that variable for using the size of the training data. Try to avoid manually putting values in the code as far as possible.

InternalError when using TPU for training Keras model

Tags:

python

machine-learning

tensorflow

bert-language-model

tpu

a_002311

1 Answers

Chinmay

Recent Activity

Donate For Us

InternalError when using TPU for training Keras model

Tags:

python

machine-learning

tensorflow

bert-language-model

tpu

a_002311

1 Answers

Chinmay

Related questions

Recent Activity

Donate For Us