Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

InvalidArgumentError: Mismatch between the current graph and the graph from the checkpoint

So I am basically using this transformer implementation for my project: https://github.com/Kyubyong/transformer . It works great on the German to English translation it was originally written for and I modified the processing python script in order to create vocabulary files for the languages that I want to translate. This seems to work fine.

However when it comes to training I get the following error:

InvalidArgumentError (see above for traceback): Restoring from checkpoint failed. This is most likely due to a mismatch between the current graph and the graph from the checkpoint. Please ensure that you have not altered the graph expected based on the checkpoint. Original error:

Assign requires shapes of both tensors to match. lhs shape= [9796,512] rhs shape= [9786,512] [[{{node save/Assign_412}} = Assign[T=DT_FLOAT, _class=["loc:@encoder/enc_embed/lookup_table"], use_locking=true, validate_shape=true, _device="/job:localhost/replica:0/task:0/device:CPU:0"](encoder/enc_embed/lookup_table/Adam_1, save/RestoreV2:412)]]

Now I have no idea why I am getting the above error. I also reverted to the original code to translate from German to English and now I get the same error (except the lhs and rhs tensor shapes are different of course), when before it was working!

Any ideas on why this could be happening?

Thanks in advance

EDIT: This is the specific file in question here, the train.py when it is run: https://github.com/Kyubyong/transformer/blob/master/train.py Nothing has been modified other than the fact that the vocab loaded for de and en are differently (they're in fact vocab files with single letters as words). However as I mentioned that even when resorting back to the prevous working example I get the same error with different lhs and rhs dimensions.

like image 823
noob Avatar asked Oct 24 '18 20:10

noob


1 Answers

I was facing same issue while exporting/saving the model. I was referring to example given in this URL: https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/exporting_models.md

There are three things which you have to make sure are correct, if you are facing above issue:

  1. Cleanup model directory and extract fresh model

  2. Make sure that you are using correct pair of pipeline-config file and its corresponding TF model.

  3. use correct model checkpoint. see below example for that:

I updated my TRAINED_CKPT_PREFIX value to the saving point of my model and it worked from me (see below example):

TRAINED_CKPT_PREFIX=./data/model.ckpt-139

In your case please use your saving point number in my case it is 139

Previously I was using ./data/model.ckpt only which was not working.

like image 160
siraj pathan Avatar answered Sep 21 '22 19:09

siraj pathan