Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

load multiple models in Tensorflow

Tags:

tensorflow

I have written the following convolutional neural network (CNN) class in Tensorflow [I have tried to omit some lines of code for clarity.]

class CNN:
def __init__(self,
                num_filters=16,        # initial number of convolution filters
             num_layers=5,           # number of convolution layers
             num_input=2,           # number of channels in input
             num_output=5,          # number of channels in output
             learning_rate=1e-4,    # learning rate for the optimizer
             display_step = 5000,   # displays training results every display_step epochs
             num_epoch = 10000,     # number of epochs for training
             batch_size= 64,        # batch size for mini-batch processing
             restore_file=None,      # restore file (default: None)

            ):

                # define placeholders
                self.image = tf.placeholder(tf.float32, shape = (None, None, None,self.num_input))  
                self.groundtruth = tf.placeholder(tf.float32, shape = (None, None, None,self.num_output)) 

                # builds CNN and compute prediction
                self.pred = self._build()

                # I have already created a tensorflow session and saver objects
                self.sess = tf.Session()
                self.saver = tf.train.Saver()

                # also, I have defined the loss function and optimizer as
                self.loss = self._loss_function()
                self.optimizer = tf.train.AdamOptimizer(learning_rate).minimize(self.loss)

                if restore_file is not None:
                    print("model exists...loading from the model")
                    self.saver.restore(self.sess,restore_file)
                else:
                    print("model does not exist...initializing")
                    self.sess.run(tf.initialize_all_variables())

def _build(self):
    #builds CNN

def _loss_function(self):
    # computes loss


# 
def train(self, train_x, train_y, val_x, val_y):
    # uses mini batch to minimize the loss
    self.sess.run(self.optimizer, feed_dict = {self.image:sample, self.groundtruth:gt})


    # I save the session after n=10 epochs as:
    if epoch%n==0:
        self.saver.save(sess,'snapshot',global_step = epoch)

# finally my predict function is
def predict(self, X):
    return self.sess.run(self.pred, feed_dict={self.image:X})

I have trained two CNNs for two separate tasks independently. Each took around 1 day. Say, model1 and model2 are saved as 'snapshot-model1-10000' and 'snapshot-model2-10000' (with their corresponding meta files) respectively. I can test each model and compute its performance separately.

Now, I want to load these two models in a single script. I would naturally try to do as below:

cnn1 = CNN(..., restore_file='snapshot-model1-10000',..........) 
cnn2 = CNN(..., restore_file='snapshot-model2-10000',..........)

I encounter the error [The error message is long. I just copied/pasted a snippet of it.]

NotFoundError: Tensor name "Variable_26/Adam_1" not found in checkpoint files /home/amitkrkc/codes/A549_models/snapshot-hela-95000
     [[Node: save_1/restore_slice_85 = RestoreSlice[dt=DT_FLOAT, preferred_shard=-1, _device="/job:localhost/replica:0/task:0/cpu:0"](_recv_save_1/Const_0, save_1/restore_slice_85/tensor_name, save_1/restore_slice_85/shape_and_slice)]]

Is there a way to load from these two files two separate CNNs? Any suggestion/comment/feedback is welcome.

Thank you,

like image 691
Amit Avatar asked Feb 01 '17 21:02

Amit


People also ask

How do you continue training in Tensorflow?

To continue training a loaded model with checkpoints, we simply rerun the model. fit function with the callback still parsed. This however overwrites the currently saved best model, so make sure to change the checkpoint file path if this is undesired.


1 Answers

Yes there is. Use separate graphs.

g1 = tf.Graph()
g2 = tf.Graph()

with g1.as_default():
    cnn1 = CNN(..., restore_file='snapshot-model1-10000',..........) 
with g2.as_default():
    cnn2 = CNN(..., restore_file='snapshot-model2-10000',..........)

EDIT:

If you want them into same graph. You'll have to rename some variables. One idea is have each CNN in separate scope and let saver handle variables in that scope e.g.:

saver = tf.train.Saver(tf.get_collection(tf.GraphKeys.GLOBAL_VARIABLES), scope='model1')

and in cnn wrap all your construction in scope:

with tf.variable_scope('model1'):
    ...

EDIT2:

Other idea is renaming variables which saver manages (since I assume you want to use your saved checkpoints without retraining everything. Saving allows different variable names in graph and in checkpoint, have a look at documentation for initialization.

like image 122
nmiculinic Avatar answered Oct 15 '22 04:10

nmiculinic