Why some people chain the parameters of two different networks and train them with same optimizer?

Question

I was looking at CycleGAN's official pytorch implementation and there, author chained the parameters of both networks and used a single optimizer for both network. How does this work? Is it better than using two different optimizers for two different networks ?

all_params = chain(module_a.parameters(), module_b.parameters())
optimizer = torch.optim.Adam(all_params)

Zabir Al Nazi · Accepted Answer

From chain documentation: https://docs.python.org/3/library/itertools.html#itertools.chain

itertools.chain(*iterables)

    Make an iterator that returns elements from the first iterable until it is exhausted, then proceeds to the next iterable, until all of the iterables are exhausted.

As parameters() gives you an iterable, you can use the optimizer to simultaneously optimize parameters for both of the networks. So, same optimizer states will be used for both models (Modules), if you use two different optimizers, the parameters will be optimized separately.

If you have a composite network, it becomes necessary to optimize the parameters (of all) at the same time, hence using a single optimizer for all of them is the way to go.

Why some people chain the parameters of two different networks and train them with same optimizer?

Tags:

python

deep-learning

pytorch

generative-adversarial-network

Asjad Murtaza

1 Answers

Zabir Al Nazi

Recent Activity

Donate For Us

Why some people chain the parameters of two different networks and train them with same optimizer?

Tags:

python

deep-learning

pytorch

generative-adversarial-network

Asjad Murtaza

1 Answers

Zabir Al Nazi

Related questions

Recent Activity

Donate For Us