How should "BatchNorm" layer be used in caffe?

ResNets: `"BatchNorm"`+`"Scale"` (no parameter sharing)

"BatchNorm" layer is followed immediately with "Scale" layer:

layer {
    bottom: "res2a_branch1"
    top: "res2a_branch1"
    name: "bn2a_branch1"
    type: "BatchNorm"
    batch_norm_param {
        use_global_stats: true
    }
}

layer {
    bottom: "res2a_branch1"
    top: "res2a_branch1"
    name: "scale2a_branch1"
    type: "Scale"
    scale_param {
        bias_term: true
    }
}

cifar10 example: only `"BatchNorm"`

In the cifar10 example provided with caffe, "BatchNorm" is used without any "Scale" following it:

layer {
  name: "bn1"
  type: "BatchNorm"
  bottom: "pool1"
  top: "bn1"
  param {
    lr_mult: 0
  }
  param {
    lr_mult: 0
  }
  param {
    lr_mult: 0
  }
}

cifar10 Different `batch_norm_param` for `TRAIN` and `TEST`

batch_norm_param: use_global_scale is changed between TRAIN and TEST phase:

layer {
  name: "bn1"
  type: "BatchNorm"
  bottom: "pool1"
  top: "bn1"
  batch_norm_param {
    use_global_stats: false
  }
  param {
    lr_mult: 0
  }
  param {
    lr_mult: 0
  }
  param {
    lr_mult: 0
  }
  include {
    phase: TRAIN
  }
}
layer {
  name: "bn1"
  type: "BatchNorm"
  bottom: "pool1"
  top: "bn1"
  batch_norm_param {
    use_global_stats: true
  }
  param {
    lr_mult: 0
  }
  param {
    lr_mult: 0
  }
  param {
    lr_mult: 0
  }
  include {
    phase: TEST
  }
}

So what should it be?

How should one use"BatchNorm" layer in caffe?

681

asked Jan 12 '17 08:01

Shai

2 Answers

If you follow the original paper, the Batch normalization should be followed by Scale and Bias layers (the bias can be included via the Scale, although this makes the Bias parameters inaccessible). use_global_stats should also be changed from training (False) to testing/deployment (True) - which is the default behavior. Note that the first example you give is a prototxt for deployment, so it is correct for it to be set to True.

I'm not sure about the shared parameters.

I made a pull request to improve the documents on the batch normalization, but then closed it because I wanted to modify it. And then, I never got back to it.

Note that I think lr_mult: 0 for "BatchNorm" is no longer required (perhaps not allowed?), although I'm not finding the corresponding PR now.

answered Sep 18 '22 05:09

Jonathan

After each BatchNorm, we have to add a Scale layer in Caffe. The reason is that the Caffe BatchNorm layer only subtracts the mean from the input data and divides by their variance, while does not include the γ and β parameters that respectively scale and shift the normalized distribution 1. Conversely, the Keras BatchNormalization layer includes and applies all of the parameters mentioned above. Using a Scale layer with the parameter “bias_term” set to True in Caffe, provides a safe trick to reproduce the exact behavior of the Keras version. https://www.deepvisionconsulting.com/from-keras-to-caffe/

answered Sep 18 '22 05:09

Ehsan Akbari Tabar

Related questions
                            
                                ValueError: You must specify a freq or x must be a pandas object with a timeseries index [duplicate]
                            
                                Multiclass Classification with LightGBM
                            
                                How to parallelize a training loop ever samples of a batch when CPU is only available in pytorch?
                            
                                Incorporating user feedback in a ML model
                            
                                How to Fine tune existing Tensorflow Object Detection model to recognize additional classes? [closed]
                            
                                Tensorflow Estimator: Cache bottlenecks
                            
                                Need help designing fitness evaluation for a NEAT algorithm-based neural network
                            
                                Reinforcement Learning With Variable Actions
                            
                                Clustering of news articles
                            
                                Can ReLU handle a negative input?
                            
                                how is total loss calculated over multiple classes in Keras?
                            
                                NLP and Machine learning for sentiment analysis [closed]
                            
                                scikits learn and nltk: Naive Bayes classifier performance highly different
                            
                                Loading a pyspark ML model in a non-Spark environment
                            
                                About tf.nn.softmax_cross_entropy_with_logits_v2
                            
                                Using scikit-learn (sklearn), how to handle missing data for linear regression?
                            
                                What is the difference between model.LGBMRegressor.fit(x_train, y_train) and lightgbm.train(train_data, valid_sets = test_data)?
                            
                                Machine learning for monitoring servers
                            
                                Find substring in text which has the highest similarity to a given keyword

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How should "BatchNorm" layer be used in caffe?

Tags:

machine-learning

neural-network

deep-learning

caffe

batch-normalization

ResNets: `"BatchNorm"`+`"Scale"` (no parameter sharing)

cifar10 example: only `"BatchNorm"`

cifar10 Different `batch_norm_param` for `TRAIN` and `TEST`

So what should it be?

Shai

People also ask

2 Answers

Jonathan

Ehsan Akbari Tabar

Recent Activity

Donate For Us

How should "BatchNorm" layer be used in caffe?

Tags:

machine-learning

neural-network

deep-learning

caffe

batch-normalization

ResNets: "BatchNorm"+"Scale" (no parameter sharing)

cifar10 example: only "BatchNorm"

cifar10 Different batch_norm_param for TRAIN and TEST

So what should it be?

Shai

People also ask

2 Answers

Jonathan

Ehsan Akbari Tabar

Related questions

Recent Activity

Donate For Us

ResNets: `"BatchNorm"`+`"Scale"` (no parameter sharing)

cifar10 example: only `"BatchNorm"`

cifar10 Different `batch_norm_param` for `TRAIN` and `TEST`