So I am basically using this transformer implementation for my project: https://github.com/Kyubyong/transformer . It works great on the German to English translation it was originally written for and I modified the processing python script in order to create vocabulary files for the languages that I want to translate. This seems to work fine.
However when it comes to training I get the following error:
InvalidArgumentError (see above for traceback): Restoring from checkpoint failed. This is most likely due to a mismatch between the current graph and the graph from the checkpoint. Please ensure that you have not altered the graph expected based on the checkpoint. Original error:
Assign requires shapes of both tensors to match. lhs shape= [9796,512] rhs shape= [9786,512] [[{{node save/Assign_412}} = Assign[T=DT_FLOAT, _class=["loc:@encoder/enc_embed/lookup_table"], use_locking=true, validate_shape=true, _device="/job:localhost/replica:0/task:0/device:CPU:0"](encoder/enc_embed/lookup_table/Adam_1, save/RestoreV2:412)]]
Now I have no idea why I am getting the above error. I also reverted to the original code to translate from German to English and now I get the same error (except the lhs and rhs tensor shapes are different of course), when before it was working!
Any ideas on why this could be happening?
Thanks in advance
EDIT: This is the specific file in question here, the train.py when it is run: https://github.com/Kyubyong/transformer/blob/master/train.py Nothing has been modified other than the fact that the vocab loaded for de and en are differently (they're in fact vocab files with single letters as words). However as I mentioned that even when resorting back to the prevous working example I get the same error with different lhs and rhs dimensions.
I was facing same issue while exporting/saving the model. I was referring to example given in this URL: https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/exporting_models.md
There are three things which you have to make sure are correct, if you are facing above issue:
Cleanup model directory and extract fresh model
Make sure that you are using correct pair of pipeline-config file and its corresponding TF model.
use correct model checkpoint. see below example for that:
I updated my TRAINED_CKPT_PREFIX value to the saving point of my model and it worked from me (see below example):
TRAINED_CKPT_PREFIX=./data/model.ckpt-139
In your case please use your saving point number in my case it is 139
Previously I was using ./data/model.ckpt
only which was not working.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With