Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Keras: TPU models must have constant shapes for all operations

I am working with a pretrained keras model and I want to run it on TPU by Google Colaboratory, but I get the following error:

ValueError: Layer has a variable shape in a non-batch dimension. TPU models must have constant shapes for all operations.

You may have to specify 'input_length' for RNN/TimeDistributed layers.

Layer: Input shape: [(None, 128, 768), (None, 1)] Output shape: (None, None, 768)

I'm working with keras-xlnet. As I understand it, TPU needs to have fixed batch size when the model is compiled as explained here and here.

The model is loaded from checkpoint:

from keras_xlnet import Tokenizer, load_trained_model_from_checkpoint, 
      ATTENTION_TYPE_BI

checkpoint_path = 'xlnet_cased_L-12_H-768_A-12'

tokenizer = Tokenizer(os.path.join(checkpoint_path, 'spiece.model'))
model = load_trained_model_from_checkpoint(
    config_path=os.path.join(checkpoint_path, 'xlnet_config.json'),
    checkpoint_path=os.path.join(checkpoint_path, 'xlnet_model.ckpt'),
    batch_size=BATCH_SIZE,
    memory_len=512,
    target_len=SEQ_LEN,
    in_train_phase=False,
    attention_type=ATTENTION_TYPE_BI,
    )
 model.summary()

model is then compiled (after a few changes):

from keras_bert import AdamWarmup, calc_train_steps

decay_steps, warmup_steps = calc_train_steps(
    y_train.shape[0],
    batch_size=BATCH_SIZE,
    epochs=EPOCHS,
    )


model.compile(
    AdamWarmup(decay_steps=decay_steps, warmup_steps=warmup_steps, lr=LR),
    loss='binary_crossentropy',
    )

Then, model is loaded to TPU, where the error occures:

tpu_address = 'grpc://' + os.environ['COLAB_TPU_ADDR']
    strategy = tf.contrib.tpu.TPUDistributionStrategy(
    tf.contrib.cluster_resolver.TPUClusterResolver(tpu=tpu_address)
    )

with tf.keras.utils.custom_object_scope(get_custom_objects()):
    tpu_model = tf.contrib.tpu.keras_to_tpu_model(model, strategy=strategy)

Is there a way I can fix my batch size on compile time to get rid of the error above? Or is the problem something entirely different?

like image 926
chefhose Avatar asked Oct 29 '19 21:10

chefhose


1 Answers

I agree with the comments - to get it to work you would need to adjust the various variable output shapes (e.g. None, None, 768) to fixed sizes (other than the first batch dimension). Maybe you could do this with simple padding. If you can loop through the saved model layers and load the weights to a new model that you write with padded dimensions, it may even work. I would say that's more trouble than it's worth considering TPU ready versions are available already.

I suggest moving away from Keras for this model. The official TensorFlow XLNet implementation should work with TPUs without modification. It also comes with pre-trained checkpoints. https://github.com/zihangdai/xlnet

It uses the standard TPUEstimator class to send a the model function to the TPU worker so you won't need to mess around with tf.contrib.tpu.keras_to_tpu_model.

The example given in the repository can be run in colab where $TPU_NAME is $COLAB_TPU_ADDR and you upload the pretrained checkpoints and the imdb data to a bucket colab can access.

python run_classifier.py \
  --use_tpu=True \
  --tpu=${TPU_NAME} \
  --do_train=True \
  --do_eval=True \
  --eval_all_ckpt=True \
  --task_name=imdb \
  --data_dir=${IMDB_DIR} \
  --output_dir=${GS_ROOT}/proc_data/imdb \
  --model_dir=${GS_ROOT}/exp/imdb \
  --uncased=False \
  --spiece_model_file=${LARGE_DIR}/spiece.model \
  --model_config_path=${GS_ROOT}/${LARGE_DIR}/model_config.json \
  --init_checkpoint=${GS_ROOT}/${LARGE_DIR}/xlnet_model.ckpt \
  --max_seq_length=512 \
  --train_batch_size=32 \
  --eval_batch_size=8 \
  --num_hosts=1 \
  --num_core_per_host=8 \
  --learning_rate=2e-5 \
  --train_steps=4000 \
  --warmup_steps=500 \
  --save_steps=500 \
  --iterations=500
like image 178
Tyler Avatar answered Nov 16 '22 03:11

Tyler