I have read several codes that do layer initialization using <code>nn.init.kaiming_normal_()</code> of PyTorch. Some codes use the <code>fan in</code> mode which is the default. Of the many examples, one can be found here and shown below. <pre class="prettyprint"><code>init.kaiming_normal(m.weight.data, a=0, mode='fan_in') </code></pre> However, sometimes I see people using the <code>fan out</code> mode as seen here and shown below. <pre class="prettyprint"><code>if isinstance(m, nn.Conv2d): nn.init.kaiming_normal_(m.weight, mode='fan_out', nonlinearity='relu') </code></pre> Can someone give me some guidelines or tips to help me decide which mode to select? Further I am working on image super resolutions and denoising tasks using PyTorch and which mode will be more beneficial.

According to documentation: <blockquote> Choosing 'fan_in' preserves the magnitude of the variance of the weights in the forward pass. Choosing 'fan_out' preserves the magnitudes in the backwards pass. </blockquote> and according to Delving deep into rectifiers: Surpassing human-level performance on ImageNet classification - He, K. et al. (2015): <blockquote> We note that it is sufficient to use either Eqn.(14) or Eqn.(10) </blockquote> where Eqn.(10) and Eqn.(14) are <code>fan_in</code> and <code>fan_out</code> appropriately. Furthermore: <blockquote> This means that if the initialization properly scales the backward signal, then this is also the case for the forward signal; and vice versa. For all models in this paper, both forms can make them converge </blockquote> so all in all it doesn't matter much but it's more about what you are after. I assume that if you suspect your backward pass might be more "chaotic" (greater variance) it is worth changing the mode to <code>fan_out</code>. This might happen when the loss oscillates a lot (e.g. very easy examples followed by very hard ones). Correct choice of <code>nonlinearity</code> is more important, where <code>nonlinearity</code> is the activation you are using after the layer you are initializaing currently. Current defaults set it to <code>leaky_relu</code> with <code>a=0</code>, which is effectively the same as <code>relu</code>. If you are using <code>leaky_relu</code> you should change <code>a</code> to it's slope.

How to decide which mode to use for 'kaiming_normal' initialization

Tags:

initialization

pytorch

I have read several codes that do layer initialization using nn.init.kaiming_normal_() of PyTorch. Some codes use the fan in mode which is the default. Of the many examples, one can be found here and shown below.

init.kaiming_normal(m.weight.data, a=0, mode='fan_in')

However, sometimes I see people using the fan out mode as seen here and shown below.

if isinstance(m, nn.Conv2d):
                nn.init.kaiming_normal_(m.weight, mode='fan_out', nonlinearity='relu')

Can someone give me some guidelines or tips to help me decide which mode to select? Further I am working on image super resolutions and denoising tasks using PyTorch and which mode will be more beneficial.

560

asked May 17 '20 07:05

Mohit Lamba

1 Answers

According to documentation:

Choosing 'fan_in' preserves the magnitude of the variance of the weights in the forward pass. Choosing 'fan_out' preserves the magnitudes in the backwards pass.

and according to Delving deep into rectifiers: Surpassing human-level performance on ImageNet classification - He, K. et al. (2015):

We note that it is sufficient to use either Eqn.(14) or Eqn.(10)

where Eqn.(10) and Eqn.(14) are fan_in and fan_out appropriately. Furthermore:

This means that if the initialization properly scales the backward signal, then this is also the case for the forward signal; and vice versa. For all models in this paper, both forms can make them converge

so all in all it doesn't matter much but it's more about what you are after. I assume that if you suspect your backward pass might be more "chaotic" (greater variance) it is worth changing the mode to fan_out. This might happen when the loss oscillates a lot (e.g. very easy examples followed by very hard ones).

Correct choice of nonlinearity is more important, where nonlinearity is the activation you are using after the layer you are initializaing currently. Current defaults set it to leaky_relu with a=0, which is effectively the same as relu. If you are using leaky_relu you should change a to it's slope.

111

answered Oct 23 '22 04:10

Szymon Maszke

Related questions
                            
                                How to autowire properties bean from Condition
                            
                                Struct Initialization Error
                            
                                BluetoothAdapter.getDefaultAdapter(); returns null
                            
                                overriding UIImage(named: )
                            
                                How Do Zero-Initialization, Static-Initialization, and Value-Initialization Differ?
                            
                                how does a static variable not get reassigned when inside the function
                            
                                Initializing array in struct
                            
                                In what order are the different parts of a class initialized when a class is loaded in the JVM?
                            
                                Difference between "defining" and "declaring" [duplicate]
                            
                                initialize array inside struct in C
                            
                                Ambiguous Struct Constructors in D
                            
                                Python Class Variable Initialization
                            
                                Peculiar Java Scope
                            
                                iOS app failed to launch in time 0x000000008badf00d - delay in applicationDidBecomeActive
                            
                                Is 0-initialization of atomics guaranteed to set the value member to 0?
                            
                                Is C++ "declaration and initialization" statement, an expression?
                            
                                Self in init params
                            
                                Aggregate initialization does not uphold constructor access [duplicate]
                            
                                How to bridge Objective-C initWithError: method into Swift
                            
                                When is P1008 ("prohibit aggregates with user-declared constructors") useful in practice?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With