Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Optimizing subgraph of large graph - slower than optimizing subgraph by itself

I have a very large tensorflow graph, and two sets of variables: A and B. I create two optimizers:

learning_rate = 1e-3
optimizer1 = tf.train.AdamOptimizer(learning_rate).minimize(loss_1, var_list=var_list_1)
optimizer2 = tf.train.AdamOptimizer(learning_rate).minimize(loss_2, var_list=var_list_2)

The goal here is to iteratively optimize variables 1 and variables 2. The weights from variables 2 are used in the computation of loss 1, but they're not trainable when optimizing loss 1. Meanwhile, the weights from variables 1 are not used in optimizing loss 2 (I would say this is a key asymmetry).

I am finding, weirdly, that this optimization for optimizer2 is much, much slower (2x) than if I were to just optimize that part of the graph by itself. I'm not running any summaries.

Why would this phenomenon happen? How could I fix it? I can provide more details if necessary.

like image 857
user650261 Avatar asked Jan 17 '19 23:01

user650261


1 Answers

I am guessing that this is a generative adversarial network given by the relation between the losses and the parameters. It seems that the first group of parameters are the generative model and the second group make up the detector model.

If my guesses are correct, then that would mean that the second model is using the output of the first model as its input. Admittedly, I am much more informed about PyTorch than TF. There is a comment which I believe is saying that the first model could be included in the second graph. I also think this is true. I would implement something similar to the following. The most important part is just creating a copy of the generated_tensor with no graph:

// An arbitrary label
label = torch.Tensor(1.0)

// Treat GenerativeModel as the model with the first list of Variables/parameters
generated_tensor = GenerativeModel(random_input_tensor)
// Treat DetectorModel as the model with the second list of Variables/parameters
detector_prediction = DetectorModel(generated_tensor)

generated_tensor_copy = torch.tensor(generated_tensor, requires_grad=False)
detector_prediction_copy = DetectorModel(generated_tensor_copy)

//This is for optimizing the first model, but it has the second model in its graph
// which is necessary.
loss1 = loss_func1(detector_prediction, label)
// This is for optimizing the second model. It will not have the first model in its graph
loss2 = loss_func2(detector_prediction_copy, label)

I hope this is helpful. If anyone knows how to do this in TF, that would probably be very invaluable.

like image 100
Peter Bergman Avatar answered Nov 01 '22 03:11

Peter Bergman