I am working with a pretrained keras model and I want to run it on TPU by Google Colaboratory, but I get the following error:
ValueError: Layer has a variable shape in a non-batch dimension. TPU models must have constant shapes for all operations.
You may have to specify 'input_length' for RNN/TimeDistributed layers.
Layer: Input shape: [(None, 128, 768), (None, 1)] Output shape: (None, None, 768)
I'm working with keras-xlnet. As I understand it, TPU needs to have fixed batch size when the model is compiled as explained here and here.
The model is loaded from checkpoint:
from keras_xlnet import Tokenizer, load_trained_model_from_checkpoint,
ATTENTION_TYPE_BI
checkpoint_path = 'xlnet_cased_L-12_H-768_A-12'
tokenizer = Tokenizer(os.path.join(checkpoint_path, 'spiece.model'))
model = load_trained_model_from_checkpoint(
config_path=os.path.join(checkpoint_path, 'xlnet_config.json'),
checkpoint_path=os.path.join(checkpoint_path, 'xlnet_model.ckpt'),
batch_size=BATCH_SIZE,
memory_len=512,
target_len=SEQ_LEN,
in_train_phase=False,
attention_type=ATTENTION_TYPE_BI,
)
model.summary()
model is then compiled (after a few changes):
from keras_bert import AdamWarmup, calc_train_steps
decay_steps, warmup_steps = calc_train_steps(
y_train.shape[0],
batch_size=BATCH_SIZE,
epochs=EPOCHS,
)
model.compile(
AdamWarmup(decay_steps=decay_steps, warmup_steps=warmup_steps, lr=LR),
loss='binary_crossentropy',
)
Then, model is loaded to TPU, where the error occures:
tpu_address = 'grpc://' + os.environ['COLAB_TPU_ADDR']
strategy = tf.contrib.tpu.TPUDistributionStrategy(
tf.contrib.cluster_resolver.TPUClusterResolver(tpu=tpu_address)
)
with tf.keras.utils.custom_object_scope(get_custom_objects()):
tpu_model = tf.contrib.tpu.keras_to_tpu_model(model, strategy=strategy)
Is there a way I can fix my batch size on compile time to get rid of the error above? Or is the problem something entirely different?
I agree with the comments - to get it to work you would need to adjust the various variable output shapes (e.g. None, None, 768) to fixed sizes (other than the first batch dimension). Maybe you could do this with simple padding. If you can loop through the saved model layers and load the weights to a new model that you write with padded dimensions, it may even work. I would say that's more trouble than it's worth considering TPU ready versions are available already.
I suggest moving away from Keras for this model. The official TensorFlow XLNet implementation should work with TPUs without modification. It also comes with pre-trained checkpoints. https://github.com/zihangdai/xlnet
It uses the standard TPUEstimator class to send a the model function to the TPU worker so you won't need to mess around with tf.contrib.tpu.keras_to_tpu_model
.
The example given in the repository can be run in colab where $TPU_NAME
is $COLAB_TPU_ADDR
and you upload the pretrained checkpoints and the imdb data to a bucket colab can access.
python run_classifier.py \
--use_tpu=True \
--tpu=${TPU_NAME} \
--do_train=True \
--do_eval=True \
--eval_all_ckpt=True \
--task_name=imdb \
--data_dir=${IMDB_DIR} \
--output_dir=${GS_ROOT}/proc_data/imdb \
--model_dir=${GS_ROOT}/exp/imdb \
--uncased=False \
--spiece_model_file=${LARGE_DIR}/spiece.model \
--model_config_path=${GS_ROOT}/${LARGE_DIR}/model_config.json \
--init_checkpoint=${GS_ROOT}/${LARGE_DIR}/xlnet_model.ckpt \
--max_seq_length=512 \
--train_batch_size=32 \
--eval_batch_size=8 \
--num_hosts=1 \
--num_core_per_host=8 \
--learning_rate=2e-5 \
--train_steps=4000 \
--warmup_steps=500 \
--save_steps=500 \
--iterations=500
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With