Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

deeplab Restoring from checkpoint failed when training on own dataset

I am trying to train a deeplab model on my own dataset (which is a subset of the ADE20k from which I extracted only a class of objects). I want to use the mobilenet as a backbone and start training from a pretrained model. Thus, I downloaded the pretrained weights from here: https://github.com/tensorflow/models/tree/master/research/slim/nets/mobilenet (mobilenet_v2_1.4_224). Then I modified the data_segmentation.py to include my dataset:

_ADE20K_DOORS_INFORMATION = DatasetDescriptor(
    splits_to_sizes={
        'train': 3530,
        'val': 353,
    },
    num_classes=2,
    ignore_label = 255,
)

_DATASETS_INFORMATION = {
    'cityscapes': _CITYSCAPES_INFORMATION,
    'pascal_voc_seg': _PASCAL_VOC_SEG_INFORMATION,
    'ade20k': _ADE20K_INFORMATION,
    'ade20k_doors': _ADE20K_DOORS_INFORMATION,
}

I modify the train.py file (changed value of the flags) as follows:

flags.DEFINE_boolean('initialize_last_layer', False,
                     'Initialize the last layer.')
flags.DEFINE_boolean('last_layers_contain_logits_only', True,
                     'Only consider logits as last layers or not.')
flags.DEFINE_boolean('fine_tune_batch_norm', False,
                     'Fine tune the batch norm parameters or not.')

I modify the train_utils.py file so as to exclude logits from the list of variables to be restored:

from deeplab.model import LOGITS_SCOPE_NAME
exclude_list = ['global_step', LOGITS_SCOPE_NAME, 'logits']

Now when I try to train I get the following error:

InvalidArgumentError (see above for traceback): Restoring from checkpoint 
failed. This is most likely due to a mismatch between the current graph and 
the graph from the checkpoint. Please ensure that you have not altered the 
graph expected based on the checkpoint. Original error:

Assign requires shapes of both tensors to match. lhs shape= [576] rhs shape= 
[816]
[[Node: save/Assign_50 = Assign[T=DT_FLOAT, _class= 
["loc:@MobilenetV2/expanded_conv_11/expand/BatchNorm/beta"], 
use_locking=true, validate_shape=true, 
_device="/job:localhost/replica:0/task:0/device:CPU:0"] 
(MobilenetV2/expanded_conv_11/expand/BatchNorm/beta, save/RestoreV2:50)]]

Clearly, there is a mismatch between the pretrained checkpoint and my model. What am I missing? Could you please help me out? Any bit of help is much appreciated.

In order to train I use the following command:

python deeplab/train.py --logtostderr --training_number_of_steps=30000 -- 
train_split="train" --model_variant="mobilenet_v2" --output_stride=16 -- 
decoder_output_stride=4 --train_crop_size=513 --train_crop_size=513 -- 
train_batch_size=1 --dataset="ade20k_doors" -- 
tf_initial_checkpoint=deeplab/mobilenet/mobilenet_v2_1.4_224.ckpt -- 
train_logdir=deeplab/datasets/ADE20K/exp/train_on_train_set/train -- 
dataset_dir=deeplab/datasets/ADE20K/tfrecord
like image 355
Denisa Stefan Avatar asked Aug 30 '18 10:08

Denisa Stefan


1 Answers

I got rid of the error by changing the pretrained weights. It worked with this model: mobilenetv2_coco_voc_trainval

like image 114
Denisa Stefan Avatar answered Nov 07 '22 05:11

Denisa Stefan